Course Outline – Understanding Data

This course aims to brush up your data literacy skills.  After the course, attendees will be more confident assessing data and be able to analyse and draw insights from even unfamiliar datasets. 
The course is grounded and practical. Attendees work together in small teams on lots of exercises; there is only a small amount of presentation and lecture to introduce a subject and to recap on lessons learned in the exercises.

Data 101
Nearly all data is in tabular format – so we’ll look at the basics; tables, columns, rows and data types, then moving on to cardinality, uniqueness, relationships, aggregation and grouping. We’ll reinforce these concepts with some exercises.

Getting data into shape
When data is in a tidy format, it’s so much easier to analyse and visualise. We’ll explain what a tidy format is and how we can transform a dataset into a tidy shape – with operations like pivot, append, merge and split. Attendees will practice these techniques on example datasets.

Basic Descriptive Stats
We’ll quickly remind ourselves of those stats we learned a school: mean, mode and median, range … We’ll understand why they can sometimes be useful.

Exploratory Visual Analysis
The quickest and often most effective way to understand our data is to visualise it. We’ll practice making quick rough-and-ready charts of our data so we can uncover the patterns in our data; trends, outliers, correlations… We’ll build some bar, scatter, line and other charts in under 10 minutes and hopefully have that “Aha” moment when an interesting pattern in the data is revealed.

Data Quality
Data is often messy; incomplete, riddled with bad values and missing important data points. We’ll see how to identify the problems in our data, and discuss techniques how to fix these – and approaches to avoid these in the first place. In an exercise, we’ll provide a few datasets that have some data quality issues – the attendees will find these as quickly as possible

A survey of free (and nearly free) tools
This is a demo and discussion of some of the popular tools that we can use to understand and analyse data, such as Power BI Desktop SQL – since most of the world’s most valuable data is in relational databases, accessible through SQLOpen Source languages such as R and Python

A very gentle overview of Machine Learning and Predictive Analytics
Optionally, if time allows

This is a brief description of machine learning (ML); looking at techniques and giving a couple of examples of algorithms. There will be a fun exercise to see if we can show a ML model a few pictures of Chelsea or Arsenal footballers and it can tell the difference.