Data and AI Training

Home | Power BI | Excel | Python | SQL | Generative AI | Visualising Data | Analysing Data

What is data?

A useful definition of data: something we can review or analyse to provide insight, inform actions, improve decisions and outcomes.

For the examples below, ask the following questions:

When describing the data, the following terms may be useful. (We will define some of these terms more precisely later in the course.)

And when considering the properties columns and the values in them:

And, for completeness, when considering an arrangement of tables in a data model

A table of values

This represents patient stays in hospitals in London.

PatientId AdmittedDate DischargeDate Hospital Ward Tariff Ethnicity
1 2024-02-27 2024-03-06 Kingston Dermatology 2 White
2 2024-02-27 NULL Kingston Ophthalmology 8 Black British
3 2024-02-27 2024-03-08 PRUH Day Surgery 4 Asian
4 2024-02-27 2024-02-29 Oxleas Dermatology 8 NULL
5 2024-02-27 2024-03-02 Oxleas Ophthalmology 5 White

The structure of the table above

Column Name Datatype Nullability
PatientId INT NOT NULL (Primary Key)
AdmittedDate DATE NOT NULL
DischargeDate DATE NULL
Hospital VARCHAR(50) NOT NULL
Ward VARCHAR(50) NOT NULL
Tariff FLOAT NOT NULL
Ethnicity VARCHAR(50) NULL

This data about data is often referred to a metadata. Can you think of other examples of useful metadata?

A text description of the table

Here is some natural-language (English) text that describes the table.

An Excel spreadsheet contains details about patient stays in hospital.
The columns are PatientId, AdmittedDate, DischargeDate, Hospital, Ward, Tariff and Ethnicity.
PatientId is a whole number and must be unique.
AdmittedDate and DischargeDate must be dates.
Hospital, Ward and Ethnicity are text fields, with a maximum length of 50 characters.
Tariff is a floating point number and represent the costs associated with that patient's stay in hospital.
All columns are mandatory, apart from DischargeDate and Ethnicity.

Hint: copy this text into an AI model and ask it to:

Does this change your opinion on whether this is data?

If you decide that this text is data, what about the novel Pride and Prejudice? And if this novel is data, what sort of useful analysis can we perform on it?

A SQL statement that refers to the table

SELECT
	ps.PatientId
	, ps.AdmittedDate
	, ps.DischargeDate
	, ps.Hospital
	, ps.Ward
	, DATEDIFF(DAY, ps.AdmittedDate, ps.DischargeDate) + 1 AS LengthOfStay
FROM
	PatientStay ps
WHERE
	ps.Hospital IN ( 'Oxleas', 'PRUH' )
	AND ps.AdmittedDate BETWEEN '2024-02-01' AND '2024-02-28'
	AND ps.Ward LIKE '%Surgery'
ORDER BY
	ps.AdmittedDate DESC
	, ps.PatientId DESC;

Hints:

A hospital consultant’s notes

This ChatGPT session shows a NHS demo of automated tools to create a structured summary based on a consultant’s notes here.

A sequence of genetic code

ATG GCC TCT GGG TGT GGA CCT GGC CTC TGG GAC CTC TAC CAG CAG CAG CTT
CTT GGT GAA GCT GGT GGA GCT CTG GGT GGT CCT GCT GGT CCT GCC TGT CTC
CTC CTC CTC GGC TGC CTC GGC AGC AGC AGG GGC TGC AGC CTC AGG GAG GCT

This is a partial sequence of the human insulin gene (INS), which plays a critical role in glucose metabolism.

This is an image of the database map (entity relationship diagram) on the PatientStay table and another table.

patientstay-hospital-er-diagram

Hint: consider or find an image of a MRI scan. Is this data?

The contents of a API

This appears in a browser for the URL http://api.open-notify.org/astros.json.

{
  "people": [
    {
      "craft": "ISS",
      "name": "Oleg Kononenko"
    },
    {
      "craft": "ISS",
      "name": "Nikolai Chub"
    },
    {
      "craft": "ISS",
      "name": "Tracy Caldwell Dyson"
    },
    {
      "craft": "ISS",
      "name": "Matthew Dominick"
    },
    {
      "craft": "ISS",
      "name": "Michael Barratt"
    },
    {
      "craft": "ISS",
      "name": "Jeanette Epps"
    },
    {
      "craft": "ISS",
      "name": "Alexander Grebenkin"
    },
    {
      "craft": "ISS",
      "name": "Butch Wilmore"
    },
    {
      "craft": "ISS",
      "name": "Sunita Williams"
    },
    {
      "craft": "Tiangong",
      "name": "Li Guangsu"
    },
    {
      "craft": "Tiangong",
      "name": "Li Cong"
    },
    {
      "craft": "Tiangong",
      "name": "Ye Guangfu"
    }
  ],
  "number": 12,
  "message": "success"
}

Optional examples

If you have completed the exercise, you may want to consider these examples.

An audio file

This audio file was generated by ChatGPT when ask to write and recite a poem about Python data structures here.

A video file

This is a YouTube video on Data Analysis with Excel here. The original on YouTube is here.

(An image of) a weather forecast

This is from the BBC Weather website for London on 22nd Feb 2025.

bbc_weather_london_20250222

An Excel spreadsheet model

This is an Excel spreadsheet of a cash flow model, containing inputs, assumptions, calculations and results here.

A recording, transcript and AI summary of a Teams meeting (possibly of this course)

Teams will produce a video recording, transcript and AI summary of a meeting, possibly this one.