Decoding the Test Design
Understand the test structure and scoring mechanics.
Part 1/3 — Advanced Theory & Mechanics
This whitepaper investigates the structural engineering of the Digital SAT (dSAT) through the lens of Item Response Theory (IRT) and Multistage Adaptive Testing (MST) frameworks. By deconstructing the College Board’s transition from a linear paper-based format to the current digital delivery system, we analyze the psychometric parameters that govern score normalization and the mathematical trade-offs inherent in the transition to a two-stage adaptive model. This analysis focuses on the standard error of measurement (SEM), the logic of the three-parameter logistic (3PL) model, and the systemic implications of a non-penalized scoring regime on candidate strategy.
Psychometric Foundations: Item Response Theory (IRT) and the 3PL Model
The foundational architecture of the SAT is built upon Item Response Theory (IRT), specifically the Three-Parameter Logistic (3PL) model. Unlike Classical Test Theory (CTT), which assesses performance based on raw totals, IRT evaluates the relationship between a test-taker’s latent ability ($\theta$) and the probability of a correct response. The 3PL model incorporates three critical variables: item difficulty ($b$), item discrimination ($a$), and the pseudo-guessing parameter ($c$). In the context of the dSAT, the College Board utilizes these parameters to ensure that scores are comparable across different test forms administered on various dates, such as the March, May, and June sittings. The discrimination parameter ($a$) is particularly vital; it measures how effectively a single question distinguishes between high-ability and low-ability candidates. If an item has a low $a$ value, it provides less information toward the final $\theta$ calculation, regardless of its difficulty.
Multistage Adaptive Testing (MST) and Routing Logic
The dSAT utilizes a Two-Stage Multistage Adaptive Testing (MST) design, replacing the fixed-form linear delivery of the legacy SAT. Each section—Evidence-Based Reading and Writing (ERW) and Mathematics—is divided into two modules. Module 1 is a "routing" module containing a representative mix of easy, medium, and hard questions. Performance on Module 1 determines the difficulty trajectory of Module 2. This is governed by a predetermined "cut-score" or routing threshold. If a student's performance exceeds the theta-threshold in Module 1, they are funneled into the "Hard" version of Module 2; falling below it leads to the "Easy" version. The technical trade-off here is the "ceiling effect" inherent in the Easy version of Module 2, where even a perfect score is mathematically capped at a lower scaled score (e.g., approximately 600–650) due to the lack of high-di