Data Training from Zomalex

Home | Power BI | Excel | Python | SQL | Visualising Data | Generative AI | Analysing Data Course - Home

Tidy Data

Tidy data is a good place to start analysis and visualisation. Tidy data has:

Source (raw) data is rarely tidy. It is often wide data, with the same variable split over several columns for presentation purposes.

For example, this example shows a dataset where different types of costs (actual, budget, forecast) are split over several columns. This is wide data.

Product Actual Cost Budget Cost Forecast Cost
Alpha 101 102 103
Bravo 201 202 203

It is much better to reshape (unpivot) this data into a long (tidy) format such as:

Product Cost Type Amount
Alpha Actual 101
Alpha Budget 102
Alpha Forecast 103
Bravo Actual 201
Bravo Budget 202
Bravo Forecast 203

This allows us to compare amounts by cost type, as well as product, which is not possible with the data in wide format.