Course Outline – Getting Smarter About Data

Objectives

This course helps attendees to be more confident in assessing and appraising the data and results that are presented to them. Over several case studies, We will review the relevant data and consider: have the right questions been asked; are the conclusions reliable?  have we been misled either accidentally or intentionally?  are the sources trustworthy; is the analysis thorough and could any bias have crept in?

The format is based on group discussion based and exercises.  We’ll investigate case studies based on topical questions – we’ll use review public datasets.   

This is not a “technical” course and does not require any mastery of analytics software.  In the group exercises, we’ll use Excel just to look at some of our data.  The instructor will also demonstrate popular and free analytics tools so that attendees can understand what is available, but this will be a very small part of the course.

These days we are all deluged with data and required to make “evidence-based” decisions.  This course aims to equip people form all walks of life to be able to be more discerning and critical about data.

Prerequisites

A basic familiarity with Excel may be helpful but not essential, simply because the data in the case studies will be provided as Excel spreadsheets.  All the exercises are in groups  and we will ensure that one person in each group has some Excel experience.

Duration

This course can be run as a one-day or a two-day course.  In the two-day course, we provide more examples for each exercise and go into more depth and detail about the UK Election case study.

Course Contents

Presentation: How to lie with charts and data

Or instead how to avoid being fooled by a dodgy dataset, a vexatious visual or a mendacious map.  The presenter will show lots of examples of these – it’s your job to spot the flaw.

Surely they can’t both be right?

Group exercise: How to get to grips with a dataset

Someone has provided you with a dataset and is looking to you for answers.  We’ll talk about some approaches you can use to get started on the task; look at the size and the shape of the data; build a few quick charts; assess the quality and reliability – missing values and outliers. We may even use some helpful machine learning to discover patterns. We’ll practice these simple techniques with Excel and the instructor will demo some popular and free tools that you may want to consider using later.

Case Study : Group Exercise: The top 5%

A recent argument in the UK is whether a person earning £80,000 per year is in the top 5% of earners.  We’ll look at some data from HMRC to see if we can settle the argument and also discover some interesting insights into how much people get paid in Britain.

Does earning £80K put you in the top 5% of earners in the UK?

Case Study: Group Exercise: The Numeritis Epidemic

An epidemic of a particularly nasty disease, numeritis, has just engulfed the UK.  Sufferers become confused, irritable and suspicious and mutter about fake news.  The data shows outbreaks in certain parts of the UK while other areas are relatively unscathed.  Is this “postcode lottery” to do with NHS failures? Or is there some other reason?  We will analyse the data, ask the tough questions and come to some conclusions. (Note: the data in this exercise has not been fact-checked.)

Case Study: UK Election 2019 results

The recent UK elections generate a mass of complex data – for example, how many votes were cast for each party in each of the 650 constituencies in the 2019 election and how do they compare with the 2017 election.  Of course, we’ve seen in the media many beautiful and insightful visualisations and analyses. In this exercise, we’ll start with the data and see if we can ask interesting and important questions ignored by the journalists, such as why it helps prospective MPs to be called David.

This isn’t the 2019 election but the 2015 results – with red wall intact.