Module 1 — Why Percent-Correct Isn’t Enough
You took a test and got 70%. Is that good? It is impossible to say — because the number depends entirely on which questions you happened to be asked. The same person scores 90% on an easy form and 50% on a hard one. That single weakness is why the entire field of Item Response Theory (IRT) exists, and it is the idea this whole course is built on. This module shows you the problem, then the fix.
Classical test theory: the score is the truth
The old, intuitive approach is classical test theory (CTT). Your score is just the count of right answers: \( \text{observed} = \text{true ability} + \text{error} \). Simple, and it works fine when everyone takes the exact same test. But two things break it:
- The score is test-dependent. A 70% on a brutal form and a 70% on a gentle form are not the same achievement — but CTT reports the same number.
- Item statistics are sample-dependent. A question’s “difficulty” in CTT is just the fraction of this group who got it right. Give it to a stronger class and the same question looks easier. The item didn’t change; the yardstick did.
So a CTT score mixes together how able the student is and how hard the test was, and you can never fully separate them. For a fixed paper exam given once, that’s tolerable. For an adaptive test — where different students see different questions on purpose — it is fatal.
See the flaw yourself
Below is one student with a fixed, unchanging skill. Give them an easy form, then a hard form, and watch their raw percentage swing wildly — even though the student never changed.
This activity needs JavaScript. The point: the same student’s percent-correct rises on an easy form and falls on a hard form.
The IRT idea: put people and questions on one scale
IRT fixes this with one elegant move. It places every student and every question on a single shared scale of difficulty/ability, written with the Greek letter \( \theta \) (theta):
- A student has an ability \( \theta \) — a position on that scale. Higher means more skilled. By convention it usually runs from about \( -3 \) to \( +3 \), centered near 0.
- A question has a difficulty on the same scale. A question at \( b = 1.5 \) sits to the right of a student at \( \theta = 0 \) — it is above them.
Because they share an axis, you can ask the one question that matters: given this student’s ability and this item’s difficulty, what is the probability they answer correctly? That probability is the engine of everything that follows. When ability is far above difficulty, the chance is high; far below, it is low; right at the difficulty, it’s a coin flip. Module 2 draws that relationship as a curve.
Sort the statements
For each statement, decide whether it describes Classical Test Theory or Item Response Theory.
This activity needs JavaScript.