Home | Power BI | Excel | Python | SQL | Generative AI | Visualising Data | Analysing Data
These course notes accompany the DAX content in the intermediate and advanced Power BI courses.
DAX is a language for data modelling and calculation used with Power BI
We write DAX using the DAX Query View typically following these steps:
DAX can create four types of output:
Calculated Measures
Visual Calculations
Visual Calculations are described in detail in a separate section here.
Calculated Tables
Calculated Columns
Date Of First Order =
/*
Returns the date when the customer first made an order
Note that this is a calculated column - runs in row context looping thorough all Customer rows
SUMMARIZE returns dynamic table of the number of transactions by order date for the current customer
MINX then retrieves the earliest date in this table.
*/
MINX(
SUMMARIZE(
RELATEDTABLE('Transaction'),
'Transaction'[Order Date],
"count", countrows('Transaction')
),
[Order Date]
)
Use variables to improve readability, performance.
Split a complicated DAX expression into simple steps
Number of Customers In Band =
VAR selectedStart = SELECTEDVALUE('Sales Bin Thresholds'[Start])
VAR selectedEnd = SELECTEDVALUE('Sales Bin Thresholds'[End])
RETURN
CALCULATE(
[Number of Customers],
FILTER(
'Customer',
[Sales] >= selectedStart &&
[Sales] < selectedEnd
)
)
If we place a numeric field on the Values well of a visual, Power BI will, under the covers, automatically create an implicit measure.
We can see this if we launch the Performance Inspector, refresh such a visual, and then run the query in DAX query view
The resulting DAX contains the implict measure CALCULATE(SUM('Transaction'[Sales]))
DEFINE
VAR __DS0Core =
SUMMARIZECOLUMNS('Customer'[Segment], "SumSales", CALCULATE(SUM('Transaction'[Sales])))
VAR __DS0PrimaryWindowed =
TOPN(1001, __DS0Core, [SumSales], 0, 'Customer'[Segment], 1)
EVALUATE
__DS0PrimaryWindowed
ORDER BY
[SumSales] DESC, 'Customer'[Segment]
It is very good practice to hide numeric field and create an explicit measure.
DEFINE MEASURE 'Calculations'[Sales] = SUM('Transaction'[Sales])
EVALUATE
SUMMARIZECOLUMNS(
Customer[Segment],
"Sales", 'Calculations'[Sales]
)
This means that subsequent DAX formulas are simpler, for example:
DEFINE MEASURE 'Calculations'[COGS] = [Sales] - [Profit]
Explicit measures helpful if measure does not useful default aggregation, for example:
DEFINE MEASURE 'Calculations'[Shortest Lifespan] = MIN(‘HealthData’[Lifespan])
In fact, creating explicit visuals is almost mandatory. Fileds parameters don’t work with implicit measures. Calculations groups set the model to ‘discourage impicit measures’.
Examples of aggregator functions are SUM(), AVERAGE(), MIN(), MAX()
Aggregator functions take a column e.g. SUM (
Examples of iterator functions are SUMX(), MINX(), RANKX() and FILTER().
Iterator Functions loop through a table, calculate an expression then aggregate the result.
Example: SUMX (<table>, <expr>)
Last Week Total = SUMX(Gutenberg, Gutenberg[LastWeek])
The expression can be a column, a RELATED() column or, for example, a multiplication of two columns
Retail Price =
SUMX(
Sales,
Sales[Quantity] * Sales[Unit Price]
)
Full Price =
SUMX(
Sales,
Sales[Quantity] * RELATED(Product[Price])
)
FILTER(table, filter_condition) returns a table
The output is a table with those only rows passing the test (could be 0, 1, several rows).
Example
DEFINE
MEASURE 'Calculations'[Furniture Sales Values] = CALCULATE(
[Sales],
FILTER(
VALUES('Product'[Category]),
-- VALUES() returns a table of the distinct values, respecting any filter context
'Product'[Category] = "Furniture"
)
)
Implicit filter context is applied to a measure by filters from the visuals on the page. This could be from the visual that shows the measure (for example, filters applied by axis or legend) or by other filters on the page such as a slicer, or filters applied by the filter pane
In the snapshot below, the measure defined as:
Number of Jobs = sum(Table[_Number of Jobs])
Question) Each mark on a viz has a different value but all marks use same DAX formula so how are they different?
Answer) Each mark is implicitly filtered by the axes of the visual (e.g. axis in bar chart, row & column headers on matrix) and by external slicers
Row context applies in the
Example: a calculated column of a table.
Both Weeks = Gutenberg[LastWeek] + Gutenberg[ThisWeek]
Example: within an iterator function
Weekly Increase =
SUMX(
Gutenberg,
Gutenberg[ThisWeek] - Gutenberg[LastWeek] -- this expression is executed in a row context
)
Note:
Implicit filter: context (query context) provided by the visuals on the page
vs
Explicit filter: adding/ replacing filters with DAX code
We can modify implicit filter context using:
These are the only functions that modify the filter context.
CALCULATE (<expr>, <filter1>, <filter2>, ...)
Returns <expr> in the context modified by the filters
The Calculate Modifier
Number of English Authors =
CALCULATE(
COUNTROWS(Author), -- the expression
FILTER(
-- ALL() returns a table required by FILTER()
ALL(Author[Author Country]),
-- The boolean uses the [Author Country] column - in the table returned by ALL().
Author[Author Country] = "England"
)
)
We can write the previous measure using syntax sugar. Under the covers it expands to the previous definition
Number of English Authors (sugared) =
CALCULATE(
COUNTROWS(Author),
Author[Author Country] = "England"
)
REMOVEFILTERS() is a new calculate modifier which has a more meaningful name
Number of English Authors (sugared) =
CALCULATE(
COUNTROWS(Author),
REMOVEFILTERS(Author[Author Country])
FILTER(
VALUES(Author[Author Country]),
Author[Author Country] = "England")
)
Time intelligence is ubiquitous and important.
Use date (calendar) tables. Mark as a date table.
DAX has built time-intelligence functions e.g. TOTALYTD()
For more advanced cases e.g. moving average, we need to build from scratch
Handle several date columns in the same fact table with USERELATIONSHIP()
Common date functions include DATE(), DATEADD(), PARALLELPERIOD(),….
A brief look at the DAX functions that will be most useful in our exercises.
Scalar Function
Scalar Function
DIVIDE (<expr1>, <expr2>, <default>)
Divides expr1 by expr2
Safe divide: returns default if expr2 is zero
Returns BLANK () if default not specified
Useful for ratio calculations, e.g. percent of parent
CALENDAR (<start_date>, <end_date>)
Generates a single column table of dates from <start_date> to <end_date> inclusive.
Useful for generating sample data
CALENDARAUTO() will generate a table of dates from the 1st of Jan of the year of the earliest date to the 31st December of the year of the latest date found in all the date columns in the model.
Scalar Function
IF (boolean_expression, value_if_true, value_if_false)
Example:
DEFINE
MEASURE 'Calculations'[Sales Year On Year Change %] = IF(
ISBLANK([Sales This Year]) || ISBLANK([Sales Previous Year]),
BLANK(),
DIVIDE([Sales This Year] - [Sales Previous Year], [Sales Previous Year])
)
Scalar Function
SWITCH (<target_expr>, <expr1>, <result1>, <expr2>, <result2>, …)
Returns the first <resultN> where <exprN> matches <target_expr>
Typically use in a match first pattern
SWITCH (TRUE, <expr1>, <result1>, <expr2>, <result2>, <else>)
Scalar Function
CONCATENEX (<table[column]>)
Flattens the values in several rows of the column to a single text.
Useful in test/debug to show table content in a measure
Table Function
ALL ignores any filter context
ALL (<table[column]>)
returns a distinct set of values.
Useful in many functions, e.g. time-intelligence, ratios
VALUES Respects any filter context
VALUES (<table[column]>)
Returns a one column table of the distinct set of values
Scalar Function
HASONEVALUE (<table[column]>)
Returns TRUE if (<table[column]>) has only one value, FALSE otherwise.
Often used with VALUES. For example:
IF(
HASONEVALUE(VALUES(<table[column]>),
VALUES(<table[column]>),
“Select single value”)
Syntax sugar: SELECTEDVALUE (<table[column]>, default_value)
Scalar Function
RELATED (<table[column]>)
Used to refer to a column where row context is being applied.
Necessary since row context does not propagate through relationships.
RELATED (<table>) is on the one side
Table Function
RELATEDTABLE (<table[column]>)
RELATEDTABLE(<table>) is on the many side
Table Function
GENERATESERIES (start, end, step)
Generates a single column table of from start to end inclusive.
Useful for generating sample data.
Summarising data using SUMMARIZE (), SUMMARIZECOLUMNS () and GROUPBY ()
Reasons for summarising data and examples use cases
Using SELECTCOLUMNS () and ADDCOLUMNS ()
Joining and generating data: CROSSJOIN (), GENERATE () with FILTER ()
SELECTCOLUMNS (<table>, <column1>, <column2>,…)
returns a table with selected columns
ADDCOLUMNS (<table>, <name1>, <expr1>, <name2>, <expr2>,…)
Adds new columns to a table
Summarising data using SUMMARIZE (), SUMMARIZECOLUMNS () and GROUPBY ()
Reasons for summarising data
Performance tables
Double aggregations: max of sum, netting calculations
Example use case
Get the number of days between current and previous events
Using CROSSJOIN() for cartesian results
Using GENERATE() with FILTER() to achieve joins outside of the standard relationships
Example use-cases: