← All Item Response Theory modules

Module 3 — The Rasch / 1PL Model — Difficulty

Core concept · hands-on · about 25 minutes.

In Module 2 you met the item characteristic curve and saw that it plots the probability of a correct answer as a function of ability \( \theta \). But every curve has to sit somewhere on the \( \theta \) axis. What decides whether the S-shape is centered near \( \theta = -1 \) (an easy item) or near \( \theta = 2 \) (a hard item)? The answer is a single number called the difficulty parameter, written \( b \). This module is entirely about that one number and the powerful simplicity of the model it defines.

The one-parameter logistic model

The simplest IRT model uses exactly one item parameter — difficulty \( b \). The probability that a student with ability \( \theta \) answers the item correctly is:

\[ P(\theta) = \frac{1}{1 + e^{-(\theta - b)}} \]

That is it. One number, \( b \), fully specifies the item. Everything about the ICC — where it sits on the scale, how hard the item is — is captured by \( b \) alone. This model is called the one-parameter logistic model (1PL), or the Rasch model in honor of Danish mathematician Georg Rasch, who developed it in the late 1950s.

Notice what \( b \) does in the formula. The exponent is \( -(\theta - b) \). When ability exactly equals difficulty — that is, when \( \theta = b \) — the exponent is zero, so \( e^0 = 1 \), so \( P = 1/(1+1) = 0.5 \). This is the defining property of \( b \):

The difficulty parameter \( b \) is the ability level at which the student has exactly a 50% chance of a correct answer. It is the point where the ICC crosses the \( P = 0.5 \) horizontal line — the inflection point of the S-curve.

How difficulty shifts the curve

Because \( b \) enters the formula only as \( \theta - b \), raising \( b \) is exactly the same as shifting the entire ICC to the right along the ability axis. The shape of the curve does not change at all — it just slides:

This sliding property has a beautiful consequence: you can compare items and students directly. A student at \( \theta = 1.5 \) is comfortably above an item at \( b = 0.5 \), roughly matched with an item at \( b = 1.5 \), and below an item at \( b = 2.5 \). Ability and difficulty share the same ruler.

Where you've met difficulty — in a familiar test On the SAT, items are pre-calibrated in large-scale tryout studies. An item that 85% of students in the pilot sample answer correctly is easy — it would have a low \( b \) on the IRT scale. An item that only 30% of students get right is hard — high \( b \). What the Rasch model adds over this CTT intuition is that the calibration holds across samples: if you give the hard item to a stronger cohort, the 30% figure changes, but the \( b \) estimate stays stable because it is anchored to the shared \( \theta \) scale, not to a particular group.

All items share the same slope

In the 1PL model every item has the same slope at its inflection point. Imagine overlaying many ICC curves — each is just a horizontal translation of the same S-shape. This uniformity of slope is a strong assumption: it says all items are equally good at separating students just above versus just below their difficulty. In the real world, some items are sharper discriminators than others, which is why Module 4 introduces a slope (discrimination) parameter. But for now, the equal-slope constraint is what gives the Rasch model its special elegance.

The special magic of the Rasch model

The Rasch model is not just a simplified 2PL — it has a deep mathematical property called specific objectivity: the comparison between two students does not depend on which items they happened to see, and the comparison between two items does not depend on which students happened to take them. This means that if items fit the Rasch model, you can link test forms, equate scores, and build item banks without worrying about who took which form. It is the statistical foundation behind many large-scale educational assessments around the world.

A closely related property is that the raw score (the simple count of correct answers) is a sufficient statistic for ability under the Rasch model. This means that once you know how many items a student got right, you don't need to know which specific items they answered to estimate their ability — the count alone contains all the relevant information. That is a remarkably clean result.

This activity needs JavaScript. Adjust the difficulty slider and observe how the ICC slides left or right, and see how the probability at θ = 0 changes.

Practical range of b

Most calibrated IRT item banks keep difficulties in the range \( -3 \leq b \leq 3 \). Items outside this window are either so easy that nearly everyone gets them right (below \( -3 \)) or so hard that almost no one does (above \( +3 \)). Neither extreme adds much information — easy items are uninformative for high-ability students, hard items uninformative for low-ability ones. Well-designed tests concentrate difficulty where the examinees are, and adaptive tests dynamically choose items whose \( b \) is close to the current \( \theta \) estimate, maximizing the information gained from each question.

Sort by difficulty

For each item description, decide whether it is Easier (below average difficulty, \( b < 0 \)) or Harder (above average difficulty, \( b > 0 \)) than an average item.

This activity needs JavaScript.

Why this matters next The Rasch / 1PL model is elegant, but it makes one strong assumption: all items have the same slope. In practice some items discriminate much better than others — their ICCs are steeper near the difficulty point. Module 4 adds a second parameter, the discrimination \( a \), that controls this slope. Adding \( a \) gives the 2PL model and much better fit to real test data.
One-sentence summary: in the Rasch / 1PL model, each item has a single difficulty parameter \( b \) — the ability at which \( P(\theta) = 0.5 \) — and increasing \( b \) slides the entire ICC to the right (harder), while all items share the same slope, giving the model its special property that raw scores are sufficient statistics for ability.

Next: The 2PL Model — Discrimination →