Data Training Courses

Home | Power BI | Excel | Python | SQL | Generative AI | Visualising Data | Analysing Data

Predictive Analysis

“Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise” – John Tukey

What questions can data analysis answer?

Regression, Classification and Clustering

An algorithm is a recipe for solving a (numerical) problem. In data science, there are three types of algorithm:

Regression and classification are supervised – we can train the model with known examples.

Clustering is unsupervised (and harder)

(Supervised) Data Science Process

  1. Train Algorithm with some data -> Model
  2. Test that model works well
  3. Model + New Data -> Predictions

Two examples of regression

What is the stopping distance for a given speed?

What is the fuel efficiency (mpg) for a given engine size (disp)?

Example Algorithm: linear regression

Algorithm: dist = a speed + b
Model parameters: a = 3.9 , b = -17.6 Model formula: dist = 3.9
speed -17.6 Best fit line: overall, minimize the “difference” over all points of estimated and actual profit, finds a and b

Prediction

What is the stopping distance for a speed of 20?

How good is our model?

Let’s get more real

dist = a speed + b weight + … + c

image

Classification Example – Titanic Passenger List

R & ggplot2 in Action

image

Predicting Titanic passenger survival - decision tree

image

Predicting Titanic passenger survival - model builder tool

image

Clustering Example – Iris species

image