Home | Prices | Contact Us | Courses: Power BI - Excel - Python - SQL - Generative AI - Visualising Data - Analysing Data
A useful definition of data: something we can review or analyse to provide insight, inform actions, improve decisions and outcomes.
For the examples below, ask the following questions:
Is it data or not? You can answer ‘Yes’ or ‘No’ or on a scale from 1 to 10. Justify your answer.
If you have answered ‘Yes’ for an example, answer the next questions.
When describing the data, we may want to start with the what and why, for example:
This represents patient stays in hospitals in London.
PatientId | AdmittedDate | DischargeDate | Hospital | Ward | Tariff | Ethnicity |
---|---|---|---|---|---|---|
1 | 2024-02-27 | 2024-03-06 | Kingston | Dermatology | 2 | White |
2 | 2024-02-27 | NULL | Kingston | Ophthalmology | 8 | Black British |
3 | 2024-02-27 | 2024-03-08 | PRUH | Day Surgery | 4 | Asian |
4 | 2024-02-27 | 2024-02-29 | Oxleas | Dermatology | 8 | NULL |
5 | 2024-02-27 | 2024-03-02 | Oxleas | Ophthalmology | 5 | White |
Column Name | Datatype | Nullability |
---|---|---|
PatientId | INT | NOT NULL (Primary Key) |
AdmittedDate | DATE | NOT NULL |
DischargeDate | DATE | NULL |
Hospital | VARCHAR(50) | NOT NULL |
Ward | VARCHAR(50) | NOT NULL |
Tariff | FLOAT | NOT NULL |
Ethnicity | VARCHAR(50) | NULL |
This data about data is often referred to a metadata. Can you think of other examples of useful metadata?
Here is some natural-language (English) text that describes the table.
An Excel spreadsheet contains details about patient stays in hospital.
The columns are PatientId, AdmittedDate, DischargeDate, Hospital, Ward, Tariff and Ethnicity.
PatientId is a whole number and must be unique.
AdmittedDate and DischargeDate must be dates.
Hospital, Ward and Ethnicity are text fields, with a maximum length of 50 characters.
Tariff is a floating point number and represent the costs associated with that patient's stay in hospital.
All columns are mandatory, apart from DischargeDate and Ethnicity.
Hint: copy this text into an AI model and ask it to:
Does this change your opinion on whether this is data?
If you decide that this text is data, what about the novel Pride and Prejudice? And if this novel is data, what sort of useful analysis can we perform on it?
SELECT
ps.PatientId
, ps.AdmittedDate
, ps.DischargeDate
, ps.Hospital
, ps.Ward
, DATEDIFF(DAY, ps.AdmittedDate, ps.DischargeDate) + 1 AS LengthOfStay
FROM
PatientStay ps
WHERE
ps.Hospital IN ( 'Oxleas', 'PRUH' )
AND ps.AdmittedDate BETWEEN '2024-02-01' AND '2024-02-28'
AND ps.Ward LIKE '%Surgery'
ORDER BY
ps.AdmittedDate DESC
, ps.PatientId DESC;
Hints:
This ChatGPT session shows a NHS demo of automated tools to create a structured summary based on a consultant’s notes here.
ATG GCC TCT GGG TGT GGA CCT GGC CTC TGG GAC CTC TAC CAG CAG CAG CTT
CTT GGT GAA GCT GGT GGA GCT CTG GGT GGT CCT GCT GGT CCT GCC TGT CTC
CTC CTC CTC GGC TGC CTC GGC AGC AGC AGG GGC TGC AGC CTC AGG GAG GCT
This is a partial sequence of the human insulin gene (INS), which plays a critical role in glucose metabolism.
This is an image of the database map (entity relationship diagram) on the PatientStay table and another table.
Hint: consider or find an image of a MRI scan. Is this data?
This is a common format, JSON, used by web services. For example, the JSON below appears in a browser for the URL http://api.open-notify.org/astros.json.
{
"people": [
{
"craft": "ISS",
"name": "Oleg Kononenko"
},
{
"craft": "ISS",
"name": "Nikolai Chub"
},
{
"craft": "ISS",
"name": "Tracy Caldwell Dyson"
},
{
"craft": "ISS",
"name": "Matthew Dominick"
},
{
"craft": "ISS",
"name": "Michael Barratt"
},
{
"craft": "ISS",
"name": "Jeanette Epps"
},
{
"craft": "ISS",
"name": "Alexander Grebenkin"
},
{
"craft": "ISS",
"name": "Butch Wilmore"
},
{
"craft": "ISS",
"name": "Sunita Williams"
},
{
"craft": "Tiangong",
"name": "Li Guangsu"
},
{
"craft": "Tiangong",
"name": "Li Cong"
},
{
"craft": "Tiangong",
"name": "Ye Guangfu"
}
],
"number": 12,
"message": "success"
}
If you have completed the exercise, you may want to consider these examples.
This audio file was generated by ChatGPT when ask to write and recite a poem about Python data structures here.
This is a YouTube video on Data Analysis with Excel here. The original on YouTube is here.
This is from the BBC Weather website for London on 22nd Feb 2025.
This is an Excel spreadsheet of a cash flow model, containing inputs, assumptions, calculations and results here.
Teams will produce a video recording, transcript and AI summary of a meeting, possibly this one.