Home | Power BI | Excel | Python | SQL | Generative AI | Visualising Data | Analysing Data
A useful definition of data: something we can review or analyse to provide insight, inform actions, improve decisions and outcomes.
For the examples below, ask the following questions:
When describing the data, the following terms may be useful. (We will define some of these terms more precisely later in the course.)
And when considering the properties columns and the values in them:
And, for completeness, when considering an arrangement of tables in a data model
This represents patient stays in hospitals in London.
PatientId | AdmittedDate | DischargeDate | Hospital | Ward | Tariff | Ethnicity |
---|---|---|---|---|---|---|
1 | 2024-02-27 | 2024-03-06 | Kingston | Dermatology | 2 | White |
2 | 2024-02-27 | NULL | Kingston | Ophthalmology | 8 | Black British |
3 | 2024-02-27 | 2024-03-08 | PRUH | Day Surgery | 4 | Asian |
4 | 2024-02-27 | 2024-02-29 | Oxleas | Dermatology | 8 | NULL |
5 | 2024-02-27 | 2024-03-02 | Oxleas | Ophthalmology | 5 | White |
Column Name | Datatype | Nullability |
---|---|---|
PatientId | INT | NOT NULL (Primary Key) |
AdmittedDate | DATE | NOT NULL |
DischargeDate | DATE | NULL |
Hospital | VARCHAR(50) | NOT NULL |
Ward | VARCHAR(50) | NOT NULL |
Tariff | FLOAT | NOT NULL |
Ethnicity | VARCHAR(50) | NULL |
This data about data is often referred to a metadata. Can you think of other examples of useful metadata?
Here is some natural-language (English) text that describes the table.
An Excel spreadsheet contains details about patient stays in hospital.
The columns are PatientId, AdmittedDate, DischargeDate, Hospital, Ward, Tariff and Ethnicity.
PatientId is a whole number and must be unique.
AdmittedDate and DischargeDate must be dates.
Hospital, Ward and Ethnicity are text fields, with a maximum length of 50 characters.
Tariff is a floating point number and represent the costs associated with that patient's stay in hospital.
All columns are mandatory, apart from DischargeDate and Ethnicity.
Hint: copy this text into an AI model and ask it to:
Does this change your opinion on whether this is data?
If you decide that this text is data, what about the novel Pride and Prejudice? And if this novel is data, what sort of useful analysis can we perform on it?
SELECT
ps.PatientId
, ps.AdmittedDate
, ps.DischargeDate
, ps.Hospital
, ps.Ward
, DATEDIFF(DAY, ps.AdmittedDate, ps.DischargeDate) + 1 AS LengthOfStay
FROM
PatientStay ps
WHERE
ps.Hospital IN ( 'Oxleas', 'PRUH' )
AND ps.AdmittedDate BETWEEN '2024-02-01' AND '2024-02-28'
AND ps.Ward LIKE '%Surgery'
ORDER BY
ps.AdmittedDate DESC
, ps.PatientId DESC;
Hints:
This ChatGPT session shows a NHS demo of automated tools to create a structured summary based on a consultant’s notes here.
ATG GCC TCT GGG TGT GGA CCT GGC CTC TGG GAC CTC TAC CAG CAG CAG CTT
CTT GGT GAA GCT GGT GGA GCT CTG GGT GGT CCT GCT GGT CCT GCC TGT CTC
CTC CTC CTC GGC TGC CTC GGC AGC AGC AGG GGC TGC AGC CTC AGG GAG GCT
This is a partial sequence of the human insulin gene (INS), which plays a critical role in glucose metabolism.
This is an image of the database map (entity relationship diagram) on the PatientStay table and another table.
Hint: consider or find an image of a MRI scan. Is this data?
This appears in a browser for the URL http://api.open-notify.org/astros.json.
{
"people": [
{
"craft": "ISS",
"name": "Oleg Kononenko"
},
{
"craft": "ISS",
"name": "Nikolai Chub"
},
{
"craft": "ISS",
"name": "Tracy Caldwell Dyson"
},
{
"craft": "ISS",
"name": "Matthew Dominick"
},
{
"craft": "ISS",
"name": "Michael Barratt"
},
{
"craft": "ISS",
"name": "Jeanette Epps"
},
{
"craft": "ISS",
"name": "Alexander Grebenkin"
},
{
"craft": "ISS",
"name": "Butch Wilmore"
},
{
"craft": "ISS",
"name": "Sunita Williams"
},
{
"craft": "Tiangong",
"name": "Li Guangsu"
},
{
"craft": "Tiangong",
"name": "Li Cong"
},
{
"craft": "Tiangong",
"name": "Ye Guangfu"
}
],
"number": 12,
"message": "success"
}
If you have completed the exercise, you may want to consider these examples.
This audio file was generated by ChatGPT when ask to write and recite a poem about Python data structures here.
This is a YouTube video on Data Analysis with Excel here. The original on YouTube is here.
This is from the BBC Weather website for London on 22nd Feb 2025.
This is an Excel spreadsheet of a cash flow model, containing inputs, assumptions, calculations and results here.
Teams will produce a video recording, transcript and AI summary of a meeting, possibly this one.