Data Training from Zomalex

Home | Power BI | Excel | Python | SQL | Visualising Data | Generative AI

Data Modelling Course Notes

Data Modelling is about organising our data to encourage effective analysis, appealing and intuitive visuals and powerful calculations?

Data Modeling, sometimes called dimensional modelling :

A good data model underpins the success of our Power BI project. It helps us to:

Power BI stores data in a structured manner. All data exists within a table. Tables have rows and columns.

Power BI Specific Data Types

All columns have a data type such as

Power BI allows us to set the data type of a column.

Columns – summarization, format, category

Power BI allows us to indicate how we would like the column values to be summarised (aggregated) by default when placed on a visual.

Formats

Power BI allows us to create a default format for some columns to control how it is diplayed in visuals. For example, we can set the format of a date column to a dd-mmm-yyyy format e.g. 16-Sep-2024.
dm graphic

Data Categories

Power BI also allows us to set the category of the column. This is occasionally useful for geographic columns, as it suggests to Power BI know what to do with the column e.g. put it on an axis rather than a value.
dm graphic

Creating new columns in Power BI

Power BI can create a new column based on the values of an existing column in two ways.
If the values are categorical text, it is best to group them (in effect, creating a broader level in a hierarchy)

Bin continuous columns

If the values are continuous numeric values, it is best to bin them into bands.

Power BI Data Models

Relationships between tables

Power BI Edit Relationship Dialog

Tables Joined in Power BI Model View

Facts and Dimensions

Facts

Fact tables represent events e.g. transactions or snapshots (e.g. inventory balance at end of month).

They always have a date/time element.

They contain the values / quantities.

They are long (lots of rows) and thin (few columns).

They are usually at the “many” end of a relationship. The columns are either

dm fact table example

Dimensions

Dimension tables contain the categories to groups / sort / filter by

These are short (few rows) and fat (many descriptive columns).

They are usually at the “one” end of a relationship.

A dimension must have a unique column, a.k.a. Primary Key – this column forms the relationship with the fact table

dm dim table example

Star Schema vs Snowflake Schema Pattern

In a star schema every dimension table is directly joined to the fact table.

In this example of a star schema, the four dimension tables are directly connected to the fact table.
dm model star schema superstore

In a snowflake schema, some dimension tables are joined to another dimension table.

In this example of a snowflake schema, the Date tables is not directly connected to the fact table.
dm model snowflake schema superstore

A more complated example if a snowflake schema - European Business Authority Stress Test Results Dataset
dm example model eba

Ideally we should prefer star to snowflake schema patterns. Star schema results in simpler measures and have better performance.

Date Tables

If you have dates, you need a date table

dm dates table example

Hint: Consider investing in building an organsiation-wide date table. A date table is central and used everywhere. Organisations often have specific calendars. Implement either in Excel table, SQL or even DAX.

Role Playing Dimensions

Same dimension tables can be related to several columns in a fact table.

There can be only one active relationship between a pair of tables – other relationships must be inactive.

Use USERELATIONSHIP() and inactive relationships to create measures.

Sales By Ship Date = 
-- By the date shipped rather than the date ordered
CALCULATE(
    [Sales],
    -- This overrides the  active relationship between 'Order'[Order Date] and Dates[Date]
    USERELATIONSHIP(
        'Transaction'[Ship Date],
        Dates[Date]
    )
)

Slowly Changing Dimensions (SCDs)

Attributes Change: we can either

Good practice in building Power BI data models

Prefer Long Format to Wide Format for fact tables

Often source data is in wide format

It is better to reshape it as long format (which is the format for tidy data)

Take care with 1-1 relationships

If you see a 1-1 relationship, ask if the two tables should be merged.

1-1 relationships always have bi-directional filtering.

Take care with many-many relationships

Power BI will let you create a many to many relationships (after giving you a warning).

Better to model with a bridge table and 1 – many relationships.

Tidy up your data model (to help the user)

Deal with missing values in dimensions

Consider performance

Power BI cached data is all in memory – don’t run out.

Vertipaq engine is a column data sore

Columns are compressed in memory – up to 10x

Use Vertipaq Analyser Excel file to see what’s taking the memory.

Tips to improve performaance

Remove columns you don’t need. Especially high cardinality ones

Reduce cardinality e.g. Split date time into separate date and time columns?