Data Training from Zomalex

Home | Power BI | Excel | Python | SQL | Visualising Data | Generative AI

DAX Course Notes

These course notes accompany the DAX content in the intermediate and advanced Power BI courses.

DAX

DAX is a language for data modelling and calculation used with Power BI

We write DAX using the DAX Query View typically following these steps:

DAX Outputs

DAX can create four types of output:

Calculated Measures

Visual Calculations

Visual Calculations are described in detail in a separate section here.

Calculated Tables

Calculated Columns

When to use a DAX calculated column

Date Of First Order = 
/*
Returns  the date when the customer first made an order
Note that this is a calculated column - runs in row context looping thorough all Customer rows
SUMMARIZE returns  dynamic table of the number of transactions by order date for the current customer
MINX then retrieves the earliest date in this table.
*/
MINX(
    SUMMARIZE(
        RELATEDTABLE('Transaction'),
        'Transaction'[Order Date],
        "count", countrows('Transaction')
    ),
    [Order Date]
)

Variables

Use variables to improve readability, performance.

Split a complicated DAX expression into simple steps

Number of Customers In Band = 
VAR selectedStart = SELECTEDVALUE('Sales Bin Thresholds'[Start])
VAR selectedEnd = SELECTEDVALUE('Sales Bin Thresholds'[End])
RETURN
CALCULATE(
    [Number of Customers],
    FILTER(
        'Customer',
        [Sales] >= selectedStart &&
        [Sales] < selectedEnd
    )
)

Implicit Measures and Explicit measures.

If we place a numeric field on the Values well of a visual, Power BI will, under the covers, automatically create an implicit measure.

We can see this if we launch the Performance Inspector, refresh such a visual, and then run the query in DAX query view

The resulting DAX contains the implict measure CALCULATE(SUM('Transaction'[Sales]))

DEFINE
 VAR __DS0Core = 
  SUMMARIZECOLUMNS('Customer'[Segment], "SumSales", CALCULATE(SUM('Transaction'[Sales])))

 VAR __DS0PrimaryWindowed = 
  TOPN(1001, __DS0Core, [SumSales], 0, 'Customer'[Segment], 1)

EVALUATE
 __DS0PrimaryWindowed

ORDER BY
 [SumSales] DESC, 'Customer'[Segment]

It is very good practice to hide numeric field and create an explicit measure.

DEFINE MEASURE 'Calculations'[Sales] = SUM('Transaction'[Sales])
EVALUATE
    SUMMARIZECOLUMNS(
  Customer[Segment],
        "Sales", 'Calculations'[Sales]
    )

This means that subsequent DAX formulas are simpler, for example:
DEFINE MEASURE 'Calculations'[COGS] = [Sales] - [Profit]

Explicit measures helpful if measure does not useful default aggregation, for example:
DEFINE MEASURE 'Calculations'[Shortest Lifespan] = MIN(‘HealthData’[Lifespan])

In fact, creating explicit visuals is almost mandatory. Fileds parameters don’t work with implicit measures. Calculations groups set the model to ‘discourage impicit measures’.

Aggregator Functions

Examples of aggregator functions are SUM(), AVERAGE(), MIN(), MAX()

Aggregator functions take a column e.g. SUM ()

Iterator (X) functions

Examples of iterator functions are SUMX(), MINX(), RANKX() and FILTER().

Iterator Functions loop through a table, calculate an expression then aggregate the result.
Example: SUMX (<table>, <expr>)
Last Week Total = SUMX(Gutenberg, Gutenberg[LastWeek])

The expression can be a column, a RELATED() column or, for example, a multiplication of two columns

Retail Price = 
SUMX(
    Sales, 
    Sales[Quantity] * Sales[Unit Price]
)
Full Price = 
SUMX(
    Sales, 
    Sales[Quantity] * RELATED(Product[Price])
)

The FILTER function is a very useful iterator

FILTER(table, filter_condition) returns a table

The output is a table with those only rows passing the test (could be 0, 1, several rows).

Example

DEFINE
    MEASURE 'Calculations'[Furniture Sales Values] = CALCULATE(
    [Sales],
    FILTER(
        VALUES('Product'[Category]), 
        -- VALUES() returns a table of the distinct values, respecting any filter context
        'Product'[Category] = "Furniture"
    )
)

Filter Context

Implicit filter context is applied to a measure by filters from the visuals on the page. This could be from the visual that shows the measure (for example, filters applied by axis or legend) or by other filters on the page such as a slicer, or filters applied by the filter pane

Filter Context Rules

  1. A filter context has several filter rules. Each filter rule is a boolean (true/false) test applied to each a row
  2. Every measure in visualisation starts with all rows available then filter rules are applied one after the other to reduce the rows (cumulative filter effect)
  3. Implicit filter context arises from other fields in same visual or other visuals e.g. slicers
  4. Explicit filter context arise from our DAX code (discussed later)
  5. Explicit filter context is applied after implicit filter context.

Example

In the snapshot below, the measure defined as:
Number of Jobs = sum(Table[_Number of Jobs])

Question) Each mark on a viz has a different value but all marks use same DAX formula so how are they different?

Answer) Each mark is implicitly filtered by the axes of the visual (e.g. axis in bar chart, row & column headers on matrix) and by external slicers

Row Context

Row context applies in the

Example: a calculated column of a table.

Both Weeks = Gutenberg[LastWeek] + Gutenberg[ThisWeek]

Example: within an iterator function

Weekly Increase = 
SUMX(
    Gutenberg,
    Gutenberg[ThisWeek] - Gutenberg[LastWeek]  -- this expression is executed in a row context
)

Note:

Context and Filtering

Implicit filter: context (query context) provided by the visuals on the page vs
Explicit filter: adding/ replacing filters with DAX code

We can modify implicit filter context using:

CALCULATE() and CALCULATETABLE()

These are the only functions that modify the filter context.

CALCULATE (<expr>, <filter1>, <filter2>, ...)
Returns <expr> in the context modified by the filters

The Calculate Modifier could be a

CALCULATE and FILTER pattern - example

Number of English Authors = 
CALCULATE(
    COUNTROWS(Author),  -- the expression
    FILTER(
        -- ALL() returns a table required by FILTER()
        ALL(Author[Author Country]), 
        -- The boolean uses the [Author Country] column -  in the table returned by ALL().  
        Author[Author Country] = "England" 
    )
)

CALCULATE and FILTER syntax sugar

We can write the previous measure using syntax sugar. Under the covers it expands to the previous definition

Number of English Authors (sugared) = 
    CALCULATE(
        COUNTROWS(Author),
        Author[Author Country] = "England"
)

CALCULATE and REMOVEFILTERS()

REMOVEFILTERS() is a new calculate modifier which has a more meaningful name

Number of English Authors (sugared) = 
    CALCULATE(
        COUNTROWS(Author),
        REMOVEFILTERS(Author[Author Country])
        FILTER(
            VALUES(Author[Author Country]),
            Author[Author Country] = "England")
)

Time Intelligence

Time intelligence is ubiquitous and important.

Use date (calendar) tables. Mark as a date table.

DAX has built time-intelligence functions e.g. TOTALYTD()

For more advanced cases e.g. moving average, we need to build from scratch

Handle several date columns in the same fact table with USERELATIONSHIP()

Common date functions include DATE(), DATEADD(), PARALLELPERIOD(),….

Appendix: Common DAX Functions

A brief look at the DAX functions that will be most useful in our exercises.

COUNTROWS

Scalar Function

DIVIDE

Scalar Function

DIVIDE (<expr1>, <expr2>, <default>)
Divides expr1 by expr2
Safe divide: returns default if expr2 is zero
Returns BLANK () if default not specified
Useful for ratio calculations, e.g. percent of parent

CALENDAR

CALENDAR (<start_date>, <end_date>)
Generates a single column table of dates from <start_date> to <end_date> inclusive.
Useful for generating sample data

CALENDARAUTO() will generate a table of dates from the 1st of Jan of the year of the earliest date to the 31st December of the year of the latest date found in all the date columns in the model.

IF

Scalar Function

IF (boolean_expression, value_if_true, value_if_false)

Example:

DEFINE
   MEASURE 'Calculations'[Sales Year On Year Change %] = IF(
    ISBLANK([Sales This Year]) || ISBLANK([Sales Previous Year]),
    BLANK(),
    DIVIDE([Sales This Year] - [Sales Previous Year], [Sales Previous Year]) 
)

SWITCH

Scalar Function

SWITCH (<target_expr>, <expr1>, <result1>, <expr2>, <result2>, …)
Returns the first <resultN> where <exprN> matches <target_expr>

Typically use in a match first pattern
SWITCH (TRUE, <expr1>, <result1>, <expr2>, <result2>, <else>)

CONCATENEX

Scalar Function

CONCATENEX (<table[column]>)

Flattens the values in several rows of the column to a single text.

Useful in test/debug to show table content in a measure

ALL and VALUES

Table Function

ALL ignores any filter context
ALL (<table[column]>) returns a distinct set of values.
Useful in many functions, e.g. time-intelligence, ratios

VALUES Respects any filter context
VALUES (<table[column]>) Returns a one column table of the distinct set of values

HASONEVALUE and SELECTEDVALUE

Scalar Function

HASONEVALUE (<table[column]>)
Returns TRUE if (<table[column]>) has only one value, FALSE otherwise.

Often used with VALUES. For example:

IF(
    HASONEVALUE(VALUES(<table[column]>),
    VALUES(<table[column]>),
    “Select single value”)

Syntax sugar: SELECTEDVALUE (<table[column]>, default_value)

Scalar Function

RELATED (<table[column]>) Used to refer to a column where row context is being applied.
Necessary since row context does not propagate through relationships.

RELATED (<table>) is on the one side

RELATEDTABLE

Table Function

RELATEDTABLE (<table[column]>)

RELATEDTABLE(<table>) is on the many side

GENERATESERIES

Table Function

GENERATESERIES (start, end, step)

Generates a single column table of from start to end inclusive.

Useful for generating sample data.

Other useful functions

(More) Table Functions

Summarising data using SUMMARIZE (), SUMMARIZECOLUMNS () and GROUPBY ()
Reasons for summarising data and examples use cases

Using SELECTCOLUMNS () and ADDCOLUMNS ()

Joining and generating data: CROSSJOIN (), GENERATE () with FILTER ()

SELECTCOLUMNS and ADDCOLUMNS

SELECTCOLUMNS (<table>, <column1>, <column2>,…)
returns a table with selected columns

ADDCOLUMNS (<table>, <name1>, <expr1>, <name2>, <expr2>,…)
Adds new columns to a table

Table functions: summarising data

Summarising data using SUMMARIZE (), SUMMARIZECOLUMNS () and GROUPBY ()
Reasons for summarising data
Performance tables
Double aggregations: max of sum, netting calculations

Example use case
Get the number of days between current and previous events

Joining and generating data

Using CROSSJOIN() for cartesian results

Using GENERATE() with FILTER() to achieve joins outside of the standard relationships

Example use-cases: