← Back to QuantegyAI

Inside Large Language Models

Eight interactive modules · about 3–4 hours · Course 4 (Deep Learning) is the recommended prerequisite. No coding required.

Track your progress. Sign in to save module completion and your mastery scores across devices. Your progress also saves on this device automatically. Open the portal →

You have used ChatGPT and Claude. This course opens them up. A large language model is, at its heart, doing one thing over and over: predicting the next token. Everything else — embeddings, attention, the transformer — is machinery built to make that one prediction astonishingly good. Here you will build that machinery from the bottom up, and every piece runs live in your browser.

This course is extensively hands-on. You will train a real bigram model on a small corpus and progressively improve it; place tokens in an embedding space and identify their nearest neighbors; vary attention weights and observe which earlier words a prediction depends on; perform the query–key–value computation of self-attention via interactive controls; train a small neural language model and observe its loss curve decrease; and adjust the temperature parameter to observe its effect on the generated text, from deterministic to incoherent. Each module also presents the corresponding Hugging Face / PyTorch construct for later recognition. Each module concludes with a short mastery check; passing it marks the module complete.

The core idea

Module 1

Predicting the Next Token

The fundamental operation: given the preceding words, predict the next. Activity: build a live bigram model from a small corpus, examine the probability distribution, and sample sentences from it. AI anchor: this is the conditional probability from Course 1, applied at scale.

Module 2

Tokens & Embeddings

Models do not see words — they see numbers. Activity: turn text into tokens, place each token as a vector on a 2D map, and find a token’s nearest neighbors by meaning. AI anchor: every prompt becomes a sequence of embeddings first.

How models read context

Module 3

Attention, Intuitively

To predict the next word, which earlier words matter? Activity: move an attention slider across a sentence and watch the model lean on some words and ignore others. AI anchor: "attention is all you need" — the idea that unlocked modern AI.

Module 4

How Self-Attention Works

The underlying mechanism: queries, keys, and values. Activity: set a query and observe the dot-product scores become softmax weights that combine the values into a single output vector. AI anchor: the computation performed inside every transformer layer.

Module 5

The Transformer Block

Stack the parts into the unit that repeats dozens of times in a real LLM. Activity: walk a token through positional encoding, self-attention, a residual add, and a feed-forward layer. AI anchor: GPT and Claude are deep stacks of this one block.

Making it generate

Module 6

Training a Tiny Language Model

How is the model's knowledge acquired? Activity: train a real neural language model on a small text, epoch by epoch, and observe the loss curve decrease as its samples become more coherent. AI anchor: the same gradient descent from Course 4, applied to language.

Module 7

Sampling & Generation

The model gives probabilities — how do they become text? Activity: turn the temperature dial and switch on top-k and top-p sampling, watching the output move from robotic to creative to incoherent. AI anchor: the settings behind every chatbot reply.

Project

Module 8 · Project

Why LLMs Hallucinate & How to Use Them Well

Put it together: a model trained to sound fluent is not trained to be true. Activity: see how a confident wrong answer is generated, what the context window can and cannot hold, and turn that into practical habits for trusting and verifying AI. A synthesis check ties every module together.

Capstone

Build a Concept Manipulative

Put it all together: build a single-page interactive that teaches one Large Language Models concept, then submit it for grading and your certificate.

Why this matters This is the course that connects everything. The next-token prediction from Course 1, the vectors and gradient descent from Course 2, the training workflow from Course 3, the deep network from Course 4 — a large language model is all of them at once. After this, "AI" is no longer a black box: you know what is happening inside the tools you use every day.

← Back to QuantegyAI