Data Training from Zomalex

Home | Power BI | Excel | Python | SQL | Visualising Data | Generative AI | Analysing Data Course - Home

Predictive Analysis

“Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise” – John Tukey

What questions can data analysis answer?

Regression, Classification and Clustering

An algorithm is a recipe for solving a (numerical) problem. In data science, there are three types of algorithm:

Regression and classification are supervised – we can train the model with known examples.

Clustering is unsupervised (and harder)

(Supervised) Data Science Process

  1. Train Algorithm with some data -> Model
  2. Test that model works well
  3. Model + New Data -> Predictions

Two examples of regression

What is the stopping distance for a given speed?

What is the fuel efficiency (mpg) for a given engine size (disp)?

Example Algorithm: linear regression

Algorithm: dist = a speed + b
Model parameters: a = 3.9 , b = -17.6 Model formula: dist = 3.9
speed -17.6 Best fit line: overall, minimize the “difference” over all points of estimated and actual profit, finds a and b

Prediction

What is the stopping distance for a speed of 20?

How good is our model?

Let’s get more real

dist = a speed + b weight + … + c

image

Classification Example – Titanic Passenger List

R & ggplot2 in Action

image

Predicting Titanic passenger survival - decision tree

image

Predicting Titanic passenger survival - model builder tool

image

Clustering Example – Iris species

image