Data Training Courses

Home | Power BI | Excel | Python | SQL | Generative AI | Visualising Data | Analysing Data

Use AI Tools for SQL

We can use AI tools to write SQL statements. We provide a description of the task in English. For best results, we provide an initial prompt with details of the names of the tables and columns that we want the SQL statement to refer to. We can also provide other information: column descriptions, types, primary and foreign keys, but this is often unnecessary.

Exercise 1: Analyse the ‘Patient Stay’ dataset

Download the (fictitious) dataset for this example from here. Have a look at the data and form your own opinions about how you would analyse it. Then start a new conversation with your AI tool using the prompts below as a starter.

Optional: If you would like to test the SQL statements that the AI tool generates, you can use a tool like SQL Fiddle. This is browser based SQL editor. In SQL Fiddle, the first step is to run some SQL statements to create the PatientStay table and insert the data. These statements are in the file here

Prompt 1 (short version): Act as a helpful SQL expert. The PatientStay table represents patients admitted to a ward in a hospital. Each row in the table represents a patient admission. The dataset contains the following columns: PatientId, AdmittedDate, DischargeDate, Hospital, Ward, Tariff, Ethnicity.

Prompt 1 (alternative long version): Act as a helpful SQL expert. The PatientStay table represents patients admitted to a ward in a hospital. Each row in the table represents a patient admission. The dataset contains the following columns.

Prompt 2: Suggest some data analysis questions that could be answered with this dataset.

Prompt 3: Write a SQL statement to list the patients that were admitted to either Kingston or PRUH hospitals.

Prompt 4: Write a SQL statement to list the patients that were admitted between the 1st and 3rd March 2024.

Prompt 5: Write a SQL statement to list the patients that stayed for more than 3 days and had a tariff of more than 5.

Exercise 2: Analyse the PricePaid dataset

The data in this exercise comes from the public Price Paid dataset from the Land Registry. Each row represents the sale of a property in England. Full details of the data are provided in the example prompts below. The Land Registry site that describes this data is here

The prompts refer to a table named PricePaidSW12. This is a subset of the full England dataset, with data for a single area in London. It contains about 18,000 rows and 10 columns. You don’t need to download the file since the initial prompt below provides a description of the structure but if you want to, you can download the file from here.

In this exercise, you may or may not have access to a SQL editor that has a connection to a database that contains this table. If you do, you can copy / paste the SQL queries generated by the AI tool into the editor and see if they are correct. If not, you can simply inspect the SQL to see if it is well written and probably correct. If you want to use an editor such as SQLFiddle, here is a SQL file with the statements to create the table and a few sample rows.

Have a conversation about this data with your LLM and task it with writing SQL queries based on your instructions. The example prompts below start with simple queries and progressively become more challenging queries. Use these prompts as a starting position; feel free to add, change or replace these prompts based on the AI’s responses and any testing you are able to do.

Prompt 1: Act as a SQL expert. Keep your responses brief. Use the Microsoft T-SQL version in your code examples in your responses.

Prompt 2: I will ask you to write SQL queries to analyse data in a table PricePaidSW12. This lists about 18,000 property sales in the London SW12 post code in the last 30 years.

The most important fields in PricePaidSW12 are:

The data also contains several other fields about the address:

Prompt 3: Write SQL queries to show how many sales took place in total and how many each year?

Prompt 4: Write a SQL query to show the total market value (sum of price_paid) of all the sales each year.

Prompt 5: What are the earliest and latest dates of a sale?

Prompt 6: How many different property types are there? And how many different values of new_build, estate_type?

Prompt 7: List all the sales in 2018 between £400,000 and £500,000 in Cambray Road. Provide these details: deed_date, price_paid, property_type, postcode, paon and street.

Prompt 8: List the 25 latest sales in Ormeley Road with the following fields

Prompt 9: List properties that have been sold three times or more in the last ten years. Assume that a property is uniquely identified by the combination of the postcode, paon and saon fields.