Module 6 — Estimating Ability
So far you've learned how to build a model that predicts responses from known parameters. Now it's time to run the machinery backwards. In practice, the items' parameters \( a, b, c \) have already been calibrated — and you have a real student who just answered some questions. Your job: figure out where on the \( \theta \) scale that student actually sits. This is ability estimation, and it is the heart of what every adaptive test does after each response.
The likelihood of a response pattern
Suppose a student answers a set of \( n \) items and you record the vector \( \mathbf{u} = (u_1, u_2, \ldots, u_n) \), where \( u_i = 1 \) if the student answered item \( i \) correctly and \( u_i = 0 \) if not. Given a particular ability level \( \theta \), what is the probability of that exact pattern?
Assuming responses to different items are locally independent (given \( \theta \), knowing you got item 3 right tells you nothing extra about item 7), the probability simply multiplies across items. Each item contributes its ICC probability if the response was correct, or one minus that probability if the response was wrong:
Read this carefully. When \( u_i = 1 \) (correct), the factor for item \( i \) is \( P_i(\theta) \) — the model's probability of a correct answer. When \( u_i = 0 \) (wrong), the factor is \( 1 - P_i(\theta) \) — the model's probability of a wrong answer. Multiply these across all items and you get the likelihood: how probable is this particular response string, as a function of the unknown ability \( \theta \)?
Maximum Likelihood Estimation (MLE)
The Maximum Likelihood Estimate (MLE) of ability is the value \( \hat{\theta} \) that makes the observed response pattern as probable as possible:
In practice, because \( L(\theta) \) is a product of many small numbers, it can underflow to zero on a computer. The standard fix is to maximize the log-likelihood instead — since the logarithm is monotonically increasing, the \( \theta \) that maximizes \( \log L(\theta) \) is the same as the one that maximizes \( L(\theta) \):
For IRT with the 3PL model, this objective is smooth and unimodal (for most realistic response patterns), so a grid search or Newton–Raphson iteration finds it reliably. In the activity below you will do a grid search: compute the likelihood on a dense grid of \( \theta \) values, then find the peak.
What the likelihood curve tells you
The shape of the likelihood function carries important information beyond just the peak location. A sharp, narrow peak means the data pin down the ability estimate precisely — the student's responses are consistent with one narrow range of abilities. A flat, wide curve means the responses are broadly compatible with many different ability levels, and the estimate is uncertain.
When does the likelihood go flat? When the response pattern is surprising: a student who misses the easiest item but aces the hardest one has behaved inconsistently with any single ability value, so the likelihood spreads across a wide range of \( \theta \). Conversely, a student who gets all easy items right and all hard items wrong — exactly what the model predicts for a person at moderate ability — produces a narrow, confident peak.
Interact: flip responses and watch the MLE move
Below are five items with known parameters. Each chip shows an item's difficulty \( b \). Click a chip to toggle it between correct (lit up) and incorrect. The likelihood curve redraws instantly, and the amber marker lands at the Maximum Likelihood Estimate. Try flipping an inconsistent pattern — easy item wrong, hard item right — and notice how the curve flattens.
This activity needs JavaScript. Toggle item responses and watch the MLE on the likelihood curve shift.
Bayesian alternatives: EAP and MAP
Pure MLE has a weakness: it ignores everything you knew about the student before they started. If a student answers only two items, the likelihood curve may be almost flat, and the MLE can land at an extreme like \( \hat{\theta} = +4 \) or \( -4 \). A Bayesian approach multiplies the likelihood by a prior distribution \( \pi(\theta) \) — typically a standard normal, reflecting the fact that most test-takers are near average ability — and then summarizes the resulting posterior:
- MAP (Maximum A Posteriori): the mode of the posterior — the peak of \( L(\theta) \cdot \pi(\theta) \). Like MLE but pulled toward the prior when data are scarce.
- EAP (Expected A Posteriori): the mean of the posterior — a weighted average of \( \theta \) values. EAP tends to be smoother and is often the default in production CAT systems like those used in NCLEX.
As the number of items grows, the prior becomes negligible and both MAP and EAP converge to the MLE. For a full-length exam of 30–40 items, the three methods give nearly identical answers. For the 5–10 items typical in a short CAT session, the Bayesian correction matters.
Sort: interpreting likelihood patterns
For each description, decide whether the result is a sharp (narrow, confident) or flat (wide, uncertain) likelihood curve.
This activity needs JavaScript.