Data Training Courses

Home | Power BI | Excel | Python | SQL | Generative AI | Visualising Data | Analysing Data

How does Generative AI work?

This section describes how generative AI work under the covers. You don’t need to know this to use LLMs effectively but it helps to understand the reasons for their strengths and weaknesses. Most of the AI tools we use in this course are based on large language model (LLM).

Large Language Models

A large language model (LLM) is a type of Generative AI program designed to understand, generate, and interact with human language. It is trained on vast amounts of text data. An LLM uses statistical analysis and language modelling techniques to repeatedly predict the next word in a sentence to build a response to a prompt.

It does not always choose the most probable next word. This is controlled by a setting named “temperature” If the temperature becomes warmer, the LLM is less likely to choose the most probable word and the output becomes more creative.

An LLM can answer questions, write essays, and create code. Here are a couple of useful terms

For example, here are three different completions of the next three words of a prompt “A tasty breakfast”

Another example: given the prompt “She walked through”, possible continuations, with made-up probabilities, are:

  1. fire (10%)
  2. hell with a smile (5%)
  3. the park (3%)
  4. the fair (2%)

How to build a LLM

First step: self-supervised learning “guess the next word”

Download a lot of text (a corpus), preferably the entire internet.

Go through the next steps repeatedly for over the whole corpus in a process called self-supervised learning. This will cost $100m, takes several months and generate several hundred tons of CO2.

Next step: Reinforcement learning with human feedback (fine-tuning)

At this point the model is pre-trained and will just continue / complete text. Now we fine-tune the model to make it useful. Humans provide questions and finetune the model so that it responds with something close to a model answer. Another technique to fine tune is for the LLM to generate two different answers. People then indicate which response they prefer and this is fed back into the fine training.

As the model grows in size, it shows surprising “emergent behaviours”:

How LLMs work

An LLM is based on an AI approach known as deep learning. This uses a structure named a neutral net which looks like this.

Neural Net Schematic
Source: https://www.frontiersin.org/files/Articles/1290880/fphy-11-1290880-HTML/image_m/fphy-11-1290880-g001.jpg

This contains nodes (or neurons, or simply numbers) and connections (lines, or sometimes called edges) between the nodes. The nodes are arranged in several layers:

A node is connected to all the nodes in the previous layer by the connections. Each of these connections has a weight (a number) and it is these weights that are adjusted during the training so that the predictions and closer to the actual output values.

The node has an activation function that also determines the strength (weight) of its output based on the total of the weights of the input. (Typical ones are named RELU or Sigmoid). This activation function helps the LLM to generalise and learn.

The neural net in the diagram is tiny – it has about 30 parameters. Most neural nets are much bigger, Chat GPT has about 1 billion parameters, about the same as a rat brain, but currently less than a human brain (100 billion parameters). Size matters. Bigger is better.

Tokenisation

LLMs work with numbers. Our prompt needs to be converted from text to numbers to be input into the model and the model output needs to be converted from numbers to text. A tokenizer converts text to a sequence (array) at numbers. The tokenizer splits text into an array tokens (roughly word/word fragment).

This page https://platform.openai.com/tokenizer from OpenAI shows what a tokenizer does.

The shortest history of Generative AI

Generative AI seems to have arrived suddenly but in fact it has a long gestation. Here are a few moments of AI history.

Given that long history, what is the reason for the recent excitement? These LLMs have very recently become much more powerful and capable. For example, in 2023, OpenAI released ChatGPT-4 which can do some very impressive things:

Anybody can use ChatGPT and with a few hours training can use it very effectively - no coding proficiency or technical skills are required.

Because of these factors, ChatGPT took only 2 months to reach 100m users, compared to 9 months for TikTok and 70 months for Uber.