Module 6 — Adaptive Learning & Knowledge Tracing
Every learner who opens an app arrives with a different history. Some have already mastered a concept; others are meeting it for the first time; a few carry half-formed misconceptions that need to be untangled before new material can stick. A skilled human tutor reads those differences moment by moment and adjusts accordingly. The central promise of adaptive learning is to do the same thing at scale — for thousands of learners simultaneously, without a human tutor in the loop for every interaction.
But to adapt intelligently, a system first has to know something. The core job of any adaptive learning engine is therefore to maintain a running estimate of what each learner currently knows, updated after every interaction, and to use that estimate to decide what to present next: more practice on shaky skills, a skip past mastered ones, the next scaffolded step when the learner is ready. That inference problem — how do you estimate a hidden knowledge state from noisy responses? — is the intellectual heart of this module.
What personalization at scale actually means
The word "personalized" is heavily overloaded in ed-tech marketing. It can mean anything from "the learner typed their name on the cover page" to "the system maintains a probabilistic model of each skill and updates it on every response." For the purposes of evaluating any adaptive product — or building one — the meaningful definition is the second:
- Estimate knowledge state per learner, per skill, continuously.
- Select content — next item, difficulty level, hint timing — based on that estimate.
- Update the estimate after each response and repeat.
This three-step loop runs invisibly behind every good adaptive system. The mathematical machinery that drives step 1 — the estimation step — is called knowledge tracing, and it is our focus here.
Bayesian Knowledge Tracing
The most widely used formal model for knowledge tracing is Bayesian Knowledge Tracing (BKT), introduced by Corbett and Anderson in 1994 in the context of intelligent tutoring systems, and still widely used in both research and production systems. BKT treats each discrete skill as a hidden binary variable: the learner either has mastered the skill or hasn't. What the system can observe is only whether answers are correct or incorrect — and those observations are noisy in both directions.
BKT has four parameters for each skill:
- \( p(L_0) \) — the prior: the initial probability of mastery before any practice. What fraction of learners know this skill coming in?
- \( p(T) \) — the transit probability: the chance the learner transitions from unmastered to mastered on a given practice opportunity.
- \( p(S) \) — the slip probability: the chance the learner knows the skill but makes an error anyway — a careless mistake, a misread question.
- \( p(G) \) — the guess probability: the chance the learner doesn't know the skill but gets the answer right anyway — a lucky guess or successful elimination.
After each response, BKT applies Bayes' rule in two steps. First, a posterior update — revise the mastery estimate in light of the observed answer:
Intuitively: a correct answer is more consistent with knowing the skill than not knowing it, so \( p(L) \) rises; an incorrect answer is more consistent with not knowing it, so \( p(L) \) falls. The slip and guess parameters moderate how far the estimate moves — when the guess probability is high, a correct answer is less informative about true mastery.
Then apply the transit step — account for the possibility that the learner may have just learned the skill on this attempt, even if they started without it:
Read this as: the learner ends this attempt either already in the mastered state (probability \( p(L_{\text{post}}) \)), or having just transitioned there from the unmastered state (probability \( (1 - p(L_{\text{post}})) \cdot p(T) \)). Every practice opportunity has a chance of moving the learner closer to mastery, regardless of whether the answer was correct. When \( p(L) \) crosses a threshold — commonly 0.95 — the system treats the skill as mastered and moves on.
What the parameters mean in practice
Well-calibrated BKT parameters matter a great deal. If \( p(G) \) is set too high, a correct answer on a four-option multiple-choice question carries almost no diagnostic weight — a naive guesser gets it right 25% of the time anyway. If \( p(S) \) is too low, a single careless error will dramatically deflate the mastery estimate for a learner who is otherwise solid. In practice, parameters are typically estimated by fitting the model to historical learner log data using expectation-maximization. Reasonable defaults for a new skill run approximately: \( p(L_0) \approx 0.2 \), \( p(T) \approx 0.15 \), \( p(S) \approx 0.1 \), \( p(G) \approx 0.2 \).
Extensions of basic BKT add per-learner priors (some learners start higher), contextual slip rates (harder problems induce more slips), and multi-skill items. Deep Knowledge Tracing (DKT), introduced around 2015, replaces the hidden Markov structure with a recurrent neural network and can capture complex skill interactions — but the interpretability of classical BKT remains hard to match when teachers and learners need to understand why the system made a decision.
From estimate to action
With a per-skill mastery estimate in hand, the adaptive selection policy nearly writes itself. Common approaches include:
- Mastery gating: don't advance to skill B until \( p(L) \geq 0.95 \) on skill A. Simple, transparent, widely used.
- Expected gain: choose the item that maximizes the expected increase in \( p(L) \), balancing what the learner most needs with the information value of each possible item.
- Spaced retrieval integration: combine BKT mastery estimates with the spacing algorithms from Module 3 — items near their optimal review time from the forgetting curve get scheduling priority.
QuantegyAI's adaptive layer uses mastery gating and spaced scheduling together: a skill is practiced until mastery is estimated at threshold, and then reviews are scheduled according to an expanding-interval forgetting-curve model, with spacing growing as each successive review is successful.
Try it: watch the BKT model update
The activity below runs a live BKT model on a single skill with default parameters (\( p(L_0) = 0.20 \), \( p(T) = 0.15 \), \( p(S) = 0.10 \), \( p(G) = 0.20 \)). Click Answered correctly or Answered incorrectly and watch the mastery estimate evolve after each response. Try a run of five correct answers, then throw in an error — notice how the slip parameter means one mistake doesn't undo five successes.
This activity needs JavaScript. The idea: each correct or incorrect response updates a Bayesian mastery probability using the BKT equations — posterior Bayes update followed by a transit step.
Sort it: BKT parameters in action
Match each scenario to the BKT parameter it most directly illustrates.
This activity needs JavaScript.