Home | Power BI | Excel | Python | SQL | Generative AI | Visualising Data | Analysing Data
Data Modelling is about organising our data to encourage effective analysis, appealing and intuitive visuals and powerful calculations?
Data Modeling, sometimes called dimensional modelling :
A good data model underpins the success of our Power BI project. It helps us to:
Power BI stores data in a structured manner. All data exists within a table. Tables have rows and columns.
All columns have a data type such as
Power BI allows us to set the data type of a column.
Power BI allows us to indicate how we would like the column values to be summarised (aggregated) by default when placed on a visual.
Power BI allows us to create a default format for some columns to control how it is diplayed in visuals. For example, we can set the format of a date column to a dd-mmm-yyyy format e.g. 16-Sep-2024.
Power BI also allows us to set the category of the column. This is occasionally useful for geographic columns, as it suggests to Power BI know what to do with the column e.g. put it on an axis rather than a value.
Power BI can create a new column based on the values of an existing column in two ways.
If the values are categorical text, it is best to group them (in effect, creating a broader level in a hierarchy)
If the values are continuous numeric values, it is best to bin them into bands.
Power BI Edit Relationship Dialog
Tables Joined in Power BI Model View
Fact tables represent events e.g. transactions or snapshots (e.g. inventory balance at end of month).
They always have a date/time element.
They contain the values / quantities.
They are long (lots of rows) and thin (few columns).
They are usually at the “many” end of a relationship. The columns are either
Dimension tables contain the categories to groups / sort / filter by
These are short (few rows) and fat (many descriptive columns).
They are usually at the “one” end of a relationship.
A dimension must have a unique column, a.k.a. Primary Key – this column forms the relationship with the fact table
In a star schema every dimension table is directly joined to the fact table.
In this example of a star schema, the four dimension tables are directly connected to the fact table.
In a snowflake schema, some dimension tables are joined to another dimension table.
In this example of a snowflake schema, the Date tables is not directly connected to the fact table.
A more complated example if a snowflake schema - European Business Authority Stress Test Results Dataset
Ideally we should prefer star to snowflake schema patterns. Star schema results in simpler measures and have better performance.
If you have dates, you need a date table
Hint: Consider investing in building an organsiation-wide date table. A date table is central and used everywhere. Organisations often have specific calendars. Implement either in Excel table, SQL or even DAX.
Same dimension tables can be related to several columns in a fact table.
There can be only one active relationship between a pair of tables – other relationships must be inactive.
Use USERELATIONSHIP() and inactive relationships to create measures.
Sales By Ship Date =
-- By the date shipped rather than the date ordered
CALCULATE(
[Sales],
-- This overrides the active relationship between 'Order'[Order Date] and Dates[Date]
USERELATIONSHIP(
'Transaction'[Ship Date],
Dates[Date]
)
)
Attributes Change: we can either
Often source data is in wide format
It is better to reshape it as long format (which is the format for tidy data)
If you see a 1-1 relationship, ask if the two tables should be merged.
1-1 relationships always have bi-directional filtering.
Power BI will let you create a many to many relationships (after giving you a warning).
Better to model with a bridge table and 1 – many relationships.
Power BI cached data is all in memory – don’t run out.
Vertipaq engine is a column data sore
Columns are compressed in memory – up to 10x
Use Vertipaq Analyser Excel file to see what’s taking the memory.
Remove columns you don’t need. Especially high cardinality ones
Reduce cardinality e.g. Split date time into separate date and time columns?