local stuff for local (gov) people

Data Thing Part 8: The AI Talk

As per the last data thing, here's the content of my AI presentation talk thingy
Brief: come and talk to senior management about AI (that’s the extent of it)
Goal: give attendees an understanding of the basics so they understand the strengths and limitations of the tech, and the prerequisites for utilizing it (see– the Next Steps section at the end). Speedrun some technical terms because people like to know what all the crazy words mean?
I may not cover all of this, depends on how things go on the day.

Intro

Some background on my personal interest in AI
Starting with some concept projects in 2019 using GPT2
Work that fizzled out due to the Covid lockdowns (it was an unofficial project a colleague and I would work on at lunch time)
Picked up again when ChatGPT launched and made headlines in Nov 2022
I was on mat leave at the time, spending many hours scrolling reddit / twitter whilst holding a newborn baby that preferred contact sleeping 😊

The AI Effect

The definition of AI is very broad. Often the simplest is:
“computer systems that can perform tasks that typically require human intelligence”
The popular definition of AI changes as systems become more advanced. What was previously AI becomes normal.
See: predictive text, GPS best route calculation, Netflix / Amazon recommentations, Alexa.

“AI is whatever hasn’t been done yet” – Larry Tesler, inventor of cut, copy, paste

Right now, AI is all about LLMS and blue sparkles ✨

Subtypes of AI

Older AI systems were complex and rules based
They could process data sequentially, based on learnt facts

When we talk about AI today we are usually referring to one of:

Machine Learning

Systems that can learn from data.
Supervised Learning: System is given labeled data, inputs and outputs. E.g. image classification.
Unsupervised Learning: System is given unlabeled data, it finds patterns and relationships. E.g. customer segmentation
Reinforcement Learning: System learns through trial and error, where correct answers are rewarded. E.g. Netflix recommending films based on your past choices, spam filters.

Neural Networks

Models that mimic the structure of the human brain, with interconnected nodes – nodes are individual elements that receive information and are configured to process it in a specific way and output the results.
E.g. facial recognition, character recognition from handwritten text

Deep Learning

Advanced systems that use layers of nodes.
Deep learning is a subtype of Neural Networks where nodes are organized in layers, like groups of workers in a factory.
The output from the layer is combined and sent to further layers for processing.
The layers are combined to produce the overall output.
Used for complex tasks where there are intricate patterns and relationships between elements of the data, or tasks where there are multiple levels of abstraction.
E.g. a layer to detect low level features from an image, another layer to determine which features are useful, and another layer to group the useful features and provide an overall representation of the image – each layer builds upon the last.

Large Language Models

A type of deep learning model designed to process and generate natural sounding text. Can be use for summarization, translation, extracting information… etc

LLM Chat Interfaces

What most people think of when they hear LLM is the chat interface to interact with a model
Run through the key players in the market

Chat interface: accessible, easy for anyone to sign up and start using. Models can also be used by:

APIs: Direct access to LLMS in programs

Fine-Tuning: Pre-trained hosted models can be trained on additional data for specific applications. Fine tuning helps deliver specific output, rather than embedding specific knowledge in the model.

Local Models: Smaller models can be downloaded, trained and run on local servers. This gives the user control over the data and configuration, and access without an internet connection.

Also note:

Small models: trained on small datasets for specific tasks, fast and easier to run locally

Large models: trained on massive datasets for a wider range of tasks, very powerful but can be costly to run due to computer power / specialist hardware needed.

Fun Concepts to Impress Your Friends

the actual presentation includes some visuals, I'll come back and add them

Models: The actual systems behind tools like ChatGPT

Prompt: The input text (question, instructions) given to a LLM through a chat interface

Tokens: Input text is broken down into tokens – e.g. individual words, or characters, subwords. There are different methods of tokenization. The size (and cost) of LLM input and output is measured in tokens.

Parameters: The settings and variables adjusted during training that determine how the model behaves. The size of a model is determined by the number of parameters, which are often touted in launch announcements e.g. "GPT-4 contains 1.76 trillion parameters, vs 175 billion in GPT-3".

Weights: Numerical values that represent the strength of connections between neurons. Adjusted to minimize errors and maximize useful output. Deep learning models can iteratively adjust weights based on feedback during training.

Attention: The mechanism that led to the breakthrough progress in LLMs. Previous models were trained to “understand” the meaning of words in isolation and determine their meaning in order. The development of attention means the network can assign weights to parts of input data depending on context, enabling a better understanding of the semantic meaning of text.
The model uses this output to generate a sequence of tokens that are then converted into a text response. The model uses the patterns and relationships learnt during training (weights) to determine the result.

Embeddings: Words are converted into numerical representations called embeddings.

Vector database: A specialised database that can store and retrieve the embeddings (embeddings are a specific type of vector – a mathemeatical representation of data).

RAG (Retrieval Augmented Generation): A technique that enables an LLM to use specific, specialist information and to enhance accuracy. Content is processed and converted into embeddings (think of it as the language used by LLMs). The query is also converted to this numerical format, and a similarity metric is used to calculate what to return.

Multimodal: Systems that can process and generate multiple types of data at once, e.g. answering questions based on an uploaded image or audio.

Honorable mentions I may add if I run this talk for more staff (see Next Steps section):

What are LLMs good at?

Pretty much just talk through the table:

Strengths Weaknesses / Issues
Creative Calculations
Fast – huge volumes of data processed or generated in seconds Training bias
Intuitive – natural language interface “Hallucinations”
Natural sounding output Environmental impact
Understanding semantics – brainstorming and summarisation Repetitive
Adaptable “Black boxes” vulnerable to change
Personalisation Rely on quality data

Artificial Intelligencethe term says it all – there is the appearance of intelligence, but it's artificial! The huge volume of training data is used for the model to learn patterns – it's not retained to be looked up like a database.

AI in Local Govt

Talk about some case studies:

Note the projects all play to the strengths of LLM technology - processing large volumes of unstructured text or audio, summarisation, and repetitive text generation tasks.
And all with humans in the loop.

What could AI mean for residents?

Thinking about the whole resident experience, not just in the context of interacting with the Council

Next steps to consider

Data

Any AI implementation first requires good quality, accessible, data Accessible as in: accessible by a computer system
We should focus on this before thinking about anything else.
(see also – DATA THING!)

Skills

I believe holistic skills are as if not more important that technical at the beginning
We need to educate end users on the strengths and weaknesses of the tech
Understanding how it works in a high level way helps understand its potential and also safety implications
Systems cannot be designed and implemented by ICT staff alone – we need business areas skilled up and engaged, identifying opportunities.

Outcomes

This stuff can be simpler in the private sector when companies focus on increasing profit or revenue – metrics that can be easily defined and monitored
Users (internal users) with a good understanding of the outcomes their area contributes to will be well equipped to identify project opportunities.
Alongisde the simpler method of “its good at X pattern > identify where we do X pattern > try and replace it with AI”

Use cases

Hypothesis - Get the skills in place, cultivate genuine interest, and the use cases will come.
Alternative - copy what other people do. Also a viable option 😁

Avoid

Suppliers selling blue sparkles for big bucks 💸

Next steps for me

I want to run a version of this talk (the tech bit) as a drop in for staff to learn more about AI
And probably start working on a policy? But I prefer the vibe of a guide rather than restrictions.
Or a policy accompanied by education - understand the issues and why they are important, not just a PDFs of don'ts

#data thing