Home | Power BI | Excel | Python | SQL | Generative AI | Visualising Data | Analysing Data
Almost everyone knows the story of the Titanic. In April 1912, this magnificent ship left Southampton on its maiden voyage to New York but it never arrived. It hit an iceberg in the Atlantic and sank. There were over 2,000 people on board. Less than half survived.
A century later, this Titanic dataset is a classic case study for rookie data scientists to build a predictive model to determine who is likely to survive or perish (ignoring the fact that this is a matter of historical record). However, we will see if an AI tool can help us gain some intuition and who did and did not survive and why. We know from the film that Kate Winslet survived but poor old Leo DiCaprio did not – is that an accurate reflection?
This dataset contains a list of 891 of the passengers on board including variables (columns) such as Name, Age, Sex, and Pclass i.e. whether they travelled 1st, 2nd or 3rd class. Download the data from here
Here are some suggested prompts to start your analysis. The first prompt describes of the columns – this is useful as the variable names are difficult to understand
Act as an data analyst. The attached data has a partial list of passengers on the Titanic. Here is a description of the columns:
PassengerId
Survived
, 0 = Died, 1 = SurvivedPclass
, Passenger ClassName
Sex
Age
SibSp
, Number of siblings (brothers/sisters) and spouses in the family group travelling with the passenger (excluding the passenger)Parch
, Number of parents and children in the family group travelling with the passenger (excluding the passenger)Ticket
Fare
Cabin
Embarked
, S = Southampton, Q = Queenstown, C = Cherbourg