Home | Power BI | Excel | Python | SQL | Visualising Data | Generative AI | Analysing Data Course - Home
Tidy data is a good place to start analysis and visualisation. Tidy data has:
Source (raw) data is rarely tidy. It is often wide data, with the same variable split over several columns for presentation purposes.
For example, this example shows a dataset where different types of costs (actual, budget, forecast) are split over several columns. This is wide data.
Product | Actual Cost | Budget Cost | Forecast Cost |
---|---|---|---|
Alpha | 101 | 102 | 103 |
Bravo | 201 | 202 | 203 |
It is much better to reshape (unpivot) this data into a long (tidy) format such as:
Product | Cost Type | Amount |
---|---|---|
Alpha | Actual | 101 |
Alpha | Budget | 102 |
Alpha | Forecast | 103 |
Bravo | Actual | 201 |
Bravo | Budget | 202 |
Bravo | Forecast | 203 |
This allows us to compare amounts by cost type, as well as product, which is not possible with the data in wide format.