Data Training from Zomalex

Home | Power BI | Excel | Python | SQL | Visualising Data | Generative AI

SQL Course Notes

This contains notes to accompany the SQL courses.

Getting Started with SQL and the course

Why bother with SQL at all?

Power BI, for example, connects to databases and does it all for you image

Why you will like SQL

A table by any other name would smell as sweet

image

The SQL Fab Four

We need only 5 verbs with “tidy” data

Project (SQL) by listing the columns

SELECT ReceivedDate,
       Region,
       Category,
       Movement
FROM dbo.MessageView;

Filter (SQL) with the WHERE clause

SELECT MessageId,
       ReceivedDate,
FROM dbo.MessageView
WHERE Movement >= 5; 
-- only return large movements

WHERE is flexible; in a list, between, not exists, >and AND and OR clauses

Mutate (SQL) – add calculation to column list

SELECT MessageId,
       Movement * 2 AS DoubleMovement
FROM dbo.MessageView;

Sort: use the ORDER BY clause

SELECT MessageId,
       ReceivedDate,
       Region,
       Category,
       Movement
FROM dbo.MessageView
ORDER BY Movement;

Group By / Aggregate

SELECT ReceivedDate,
       SUM(Movement) AS TotalMovement
FROM dbo.MessageView
GROUP BY ReceivedDate,

JOIN

SELECT 
m.MessageId,
c.[Date],
c.DayOfWeek
FROM dbo.MessageView m 
INNER JOIN dbo.Calendar c 
ON m.ReceivedDate = c.[Date]

image + image

joins together to form

image

Why bother with SQL at all? (again)

We need a SQL editor

We need a SQL editor to write and run SQL. Examples are

image

Example tables used in the course

PatientStay is a small table that we use in many of the exercises.

PatientStay table

The Land Registry Tables sho sales of properties in London SW12.

Land Registry table

The Chinook database represents an online music store.

Chinook tabled

The basic structure of a SQL SELECT statement

Example

SELECT 
         PatientID
       , Hospital
       , Ward
       , AdmittedDate
FROM PatientStay
WHERE Hospital IN (Kingston, PRUH)
ORDER BY AdmittedDate DESC;

Syntax

SELECT
    <column>
       , <column>
       , <column

FROM <table>
WHERE <where-clause>
ORDER BY <column>;

A SQL SELECT statement with a GROUP BY

Example

SELECT 
       Hospital
       , Ward
       , COUNT(*) AS NumberOfPatients
       , SUM(Tariff) as TotalTariff
FROM PatientStay
WHERE AdmittedDate > 2024-03-01
GROUP BY Hospital, Ward
ORDER BY Hospital, Ward;

Syntax

SELECT <column1>
       , <column2>
       , AGG(<column>)

FROM <table>
WHERE <where-clause>
GROUP BY <column1>, <column2>
ORDER BY <column>, <column2>;

Write a SQL statement in baby steps!

Start with SELECT ps.* FROM PatientStay ps;

Use the autocomplete. It is very helpful.

Take tiny baby steps Start from something that works, not something that doesn’t.

Read the error messages – they are sometimes useful.

Look at where the red squiggerly line starts.

Top tips to run SQL statements in DBeaver

Always put a blank line between statements.

Never have a blank line within a statement.

Click <Ctrl-Return> anywhere in statement to run

dbeaver top tips

DBeaver is a great SQL editor

dbeaver editor

Use AI to solve SQL challenges

AI is a useful, practical, everyday tool, that we can use now to help us write SQL and learn SQL.

Generative AI large language models (LLMs) such as ChatGPT, Copilot, Claude, Gemini have free editions and are available from a browser.

Pair programmers help us write better SQL from inside the SQL editor e.g. GitHub Copilot inside VSCode editor

The VSCode editor with GitHub Copilot

Can always revert back to DBeaver

sql vscode copilot setup

SQL Window Functions

SQL Window functions are extremely useful, for example:

Example: Cumulative Totals

SELECT
	ReceivedMonth
	, ReceivedDate
	, Movement
	, ROW_NUMBER() OVER (PARTITION BY 	ReceivedMonth  ORDER BY ReceivedDate) AS RowIndex
	, SUM(Movement) OVER (PARTITION BY ReceivedMonth ORDER BY ReceivedDate) AS RunningRollingMovement
FROM
	Message;

sql vscode copilot setup

Special functions used in a Window function

Advanced Syntax

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
and
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
further limit the rows to be aggregated.