← All Item Response Theory modules

Module 7 — Information & Adaptive Testing

Precision engine · hands-on · about 30 minutes.

Module 6 showed you how to estimate a student's ability \( \theta \) from a set of responses. But here is the obvious follow-up: which items should you ask? A 200-item item bank contains many questions that are nearly useless for a particular student — questions so easy that everyone at that level gets them right, or so hard that everyone gets them wrong. Those items teach you almost nothing new. IRT makes "usefulness" precise through the concept of information.

The information function of one item

The Fisher information of a single item at ability level \( \theta \) measures how sharply that item's likelihood contribution peaks near \( \theta \). For the 3PL model with parameters \( a, b, c \), the information function is:

\[ I_i(\theta) = a^2 \cdot \frac{1 - P_i(\theta)}{P_i(\theta)} \cdot \left(\frac{P_i(\theta) - c}{1 - c}\right)^2 \]

There is a lot packed into this formula, so let's unpack each piece:

The key intuition: an item peaks in information right around its difficulty \( b \), because that is where the student has roughly 50/50 odds. Move far above or below \( b \) and the information falls toward zero.

Information is additive across items: the total test information at ability \( \theta \) is simply the sum of the individual item information functions. This is the central mathematical fact that makes adaptive testing tractable — you can evaluate any candidate item independently and pick the best one.

The standard error of the ability estimate

Fisher information connects directly to estimation precision. The Cramér–Rao lower bound from statistics says that no unbiased estimator can achieve a standard error smaller than \( 1/\sqrt{I(\theta)} \). For IRT, the MLE \( \hat{\theta} \) achieves this bound asymptotically, so:

\[ \text{SE}(\hat{\theta}) = \frac{1}{\sqrt{I(\theta)}} \]

This formula is elegant and powerful. To halve the standard error you need to quadruple the information. That means asking highly discriminating items near the student's ability level is far more efficient than piling on many mediocre items spread across the ability range.

Activity: explore the information function

Below are three items with different difficulty values \( b \). Use the slider to shift the middle item's difficulty and watch its information peak slide accordingly. Notice how each peak sits directly above its item's \( b \), and how higher \( a \) produces a taller, narrower peak.

This activity needs JavaScript. It plots the information function I(θ) for several items.

Computer-adaptive testing: the big idea

A computer-adaptive test (CAT) uses item information to make testing maximally efficient. The algorithm is a loop that repeats after every response:

  1. Estimate \( \theta \). Given all responses so far, compute the current ability estimate \( \hat{\theta} \) (e.g., by MLE on a grid, as in Module 6) and the associated standard error \( \text{SE} = 1/\sqrt{I_{\text{total}}(\hat{\theta})} \).
  2. Select the most informative unused item. Scan every item in the bank that has not yet been administered. For each, compute \( I_i(\hat{\theta}) \). Pick the item with the highest information at the current estimate — this is the item that, if answered, will most tighten the uncertainty around \( \hat{\theta} \).
  3. Administer and update. The student answers. The response is added to the record, the likelihood is updated, and the cycle repeats.
  4. Stop when done. The test ends when the standard error falls below a threshold (e.g., 0.3 on the \( \theta \) scale), when a fixed number of items has been asked, or when a pass/fail decision can be made with sufficient confidence.
Where this runs in the wild The GRE, GMAT, NCLEX nursing licensing exam, and dozens of professional certification tests are fully computer-adaptive, running exactly this loop in real time. NCLEX uses a stopping rule based on the confidence interval around the pass/fail threshold: the test ends as soon as it is 95% confident which side of the cut score the candidate is on — regardless of whether that takes 75 or 265 items. The adaptive algorithm is not magic; it is just the information function maximized at each step.

The result is remarkable efficiency. A well-designed CAT can achieve the same measurement precision as a traditional 100-item fixed-form test using only 20–30 items. Different students see almost entirely different questions, yet the estimates are on the same \( \theta \) scale and directly comparable — the fairness property you learned in Module 1.

Activity: simulate a CAT session

Set a true ability level (hidden from the "test"), then click Administer next item to run the CAT loop. Watch the estimate converge and the standard error shrink as items are added. The algorithm selects items maximizing \( I_i(\hat{\theta}) \) at each step — exactly as a real CAT engine would.

This activity needs JavaScript. It simulates a CAT session with live ability estimation.

Sort: information and adaptive testing

For each statement, decide whether it is True or False about information functions and CAT.

This activity needs JavaScript.

Why this matters next You now know how ability is estimated and how items are selected to minimize uncertainty. But one question remains unanswered: where do the item parameters come from in the first place? Module 8 — the capstone — explains calibration (estimating \( a, b, c \) from real response data), model fit (checking that the ICC actually matches student behavior), and differential item functioning (ensuring items are fair across groups). It closes the loop on everything you have learned.
One-sentence summary: an item's information \( I_i(\theta) = a^2(1-P)/P \cdot ((P-c)/(1-c))^2 \) peaks near its difficulty \( b \), measures how precisely it pins down ability at that \( \theta \), and drives CAT: at every step the algorithm picks the unused item with the highest information at the current estimate, making each question count as much as possible.

Next: Calibration, Fit & Fairness →