Home | Power BI | Excel | Python | SQL | Generative AI | Visualising Data | Analysing Data
Use Generative AI for Data Analysis - Introduction
How AI can help us analyse data
AI tools can help us explore, understand and visualise a dataset. They can:
- provide a summary of the data and pattern e.g. identify as a star-schema arrangement of tables and label each table as a fact or dimension table,
- suggest strengths, weakness and improvements to the data structure,
- suggest a list of questions that we can ask to understand more about the data, and
- provide those questions in actionable form perhaps as SQL, DAX or Python code.
- if data was provided, the AI can analyse and summarise the data and build a few charts to provide insights.
The two most popular languages for data analysis are SQL and Python. AI tools can help us code in these languages. For example, ChatGPT writes the SQL based on a description of the data and instructions here
Two popular applications for data analysis are are Excel and Power BI. AI tools can help us use these effectively.
For best results, we provide information about the data in an initial scene-setting prompt. We can do this in a few ways.
- We can describe the data structure: the table name(s) and a list of column names. If we want to be thorough, we can also include the descriptions of each column (data type, whether nullable, the set of valid values if appropriate), but this usually is not necessary. For example, this ChatGPT session analyses a dataset from the NHS about the number and type of hospital appointments over the last decade.
- We can upload the data file. Before we import any data into an AI tool, we need to ensure it is safe to do so.
- If the data file is too large to upload, we can copy and paste the column headers and a few representative rows into the prompt.
- We can upload an image of the structure of the table(s): column names and possibly data types. This could be a snapshot of the model view of a Power BI data model, or an entity relation diagram of a database,
After the initial prompt, we can prompt with follow-on prompts, for example:
- what would be a useful set of questions to ask about this dataset?
- how would we add a new column that combines X and Y?
- summarise the data: sum column X and group by column Y
- what are the factors that influence the values in column Z?