Precision is a measure of consistency or agreement between scores and concerns the degree to which errors of measurement affect test scores.
Measurement errors do not usually refer to inconsistencies in the aptitudes or behaviors being assessed; rather, these errors are related to factors that prevent an individual from achieving a score identical to their true latent ability or score.
There are many ways in which precision can be measured. In traditional test theory, precision is measured by a reliability coefficient, which is the ratio of true score variance to observed score variance (Lord & Novick, 1968) (See Figure 1).
Figure 1

Because true score variance can be computed as the difference between observed score variance and error variance, classical reliability can be represented as seen in Figure 2.
Figure 2

Item Response Theory (IRT)
Item Response Theory (IRT) provides a means of estimating reliability that operates on the item characteristics and the individual pattern of responses given by examinees to items within a test. The IRT analogue to classical reliability is called marginal reliability, and operates on the variance of the theta scores and the average of the expected error variance (Sireci, Thissen, & Wainer, 1991) (Figure 3).
Figure 3

If it can be safely assumed that theta is distributed N(0,1), then marginal reliability can be measured as seen in Figure 4.
When sample sizes are large, the average of the expected error variance can be computed by averaging the variance of the estimated posterior distributions across individuals. In the reliabilities reported below, the posterior standard deviation (PSD) for individual i was estimated using the methodology given in Bock and Mislevy (1982) (Figure 5).
Figure 4

Figure 5

Figure 6

ASVAB Reliabilties
For each ASVAB subtest, the equation shown in Figure 6 was used to compute EAP ability estimates for applicants that completed the test during the 2009 fiscal year (FY2009; October 1, 2008 — September 30, 2009). The equation shown in Figure 5 was then used to compute PSDs (using the EAP ability estimates, and assuming a N(0,1) population distribution). The average of the squared PSDs was then computed over applicants, and substituted into the equation shown in Figure 4 to compute subtest reliability.
For AFQT scores, reliability was computed using the methodology for computing composite reliabilities reported in Gulliksen (1987; pg. 346-347, Equation 74).
Reliability estimates were computed over all FY2009 applicants, and by gender (Male, Female), ethnic group (Hispanic, Non-Hispanic), and race (American-Indian/Alaska Native, Asian, Black/African-American, Native Hawaiian/other Pacific Islander, White/Caucasian).
The sample sizes used to compute the reliability estimates across subtests and AFQT scores are given in the table below.
Sample Sizes Used to Compute ASVAB Reliability & SEM Estimates | ||
---|---|---|
Group | P&P | CAT |
All | 164,354 | 320,988 |
Male Female |
124,201 40,138 |
259,249 61,730 |
White Hispanic Non-Hispanic |
12,649 108,496 |
51,173 221,274 |
American-Indian Asian Black Pacific Islander White |
2,8502,42331,2901,55682,084 | 6,772 8,94340,5903,570208,814 |
The estimated reliabilities for AFQT scores and the subtests that comprise AFQT scores are reported in the table below.
Learn more about: AFQT Scores | ASVAB Subtests
Estimated Reliabilities for AFQT Scores & the AFQT Subtests | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
AFQT | AR | WK | PC | MK | ||||||
Group | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT |
All | 0.94 | 0.97 | 0.87 | 0.92 | 0.88 | 0.93 | 0.75 | 0.85 | 0.85 | 0.93 |
Male Female |
0.94 0.94 |
0.97 0.97 |
0.87 0.86 |
0.92 0.92 |
0.88 0.88 |
0.93 0.93 |
0.75 0.75 |
0.85 0.86 |
0.85 0.85 |
0.93 0.93 |
Hispanic Non-Hispanic |
0.94 0.94 |
0.97 0.97 |
0.87 0.87 |
0.92 0.92 |
0.88 0.88 |
0.92 0.93 |
0.76 0.75 |
0.86 0.85 |
0.85 0.85 |
0.92 0.93 |
American-Indian Asian Black Pacific Islander White |
0.94 0.93 0.94 0.94 0.94 |
0.97 0.96 0.97 0.97 0.97 |
0.87 0.87 0.85 0.85 0.88 |
0.92 0.91 0.91 0.92 0.92 |
0.88 0.87 0.89 0.88 0.88 |
0.93 0.91 0.92 0.92 0.93 |
0.75 0.75 0.76 0.76 0.74 |
0.85 0.86 0.86 0.86 0.85 |
0.85 0.85 0.84 0.84 0.85 |
0.93 0.93 0.92 0.93 0.93 |
The estimated reliabilities for the remaining ASVAB subtests are given in the table below. Note that AI and SI are administered as separate subtests in CAT-ASVAB, but combined into one single score (labeled AS). AI and SI are combined into one single subtest (AS) in P&P-ASVAB. Scores on the combined subtest (AS) are reported for both CAT-ASVAB and P&P-ASVAB.
Estimated Reliabilities for the Non-AFQT Subtests | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GS | EI | AI | SI | AS | MC | AO | ||||||||
Group | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT |
All | 0.80 | 0.87 | 0.74 | 0.87 | n/a | 0.87 | n/a | 0.84 | 0.81 | n/a | 0.79 | 0.85 | 0.84 | 0.82 |
Male Female |
0.81 0.79 |
0.79 0.87 |
0.74 0.72 |
0.87 0.85 |
n/a n/a |
0.88 0.83 |
n/a n/a |
0.84 0.82 |
0.83 0.74 |
n/a n/a |
0.80 0.76 |
0.85 0.84 |
0.83 0.85 |
0.82 0.84 |
Hispanic Non-Hispanic |
0.79 0.80 |
0.86 0.88 |
0.73 0.74 |
0.86 0.87 |
n/a n/a |
0.85 0.87 |
n/a n/a |
0.82 0.84 |
0.78 0.81 |
n/a n/a |
0.78 0.79 |
0.84 0.85 |
0.84 0.84 |
0.83 0.82 |
American-Indian Asian Black Pacific Islander White |
0.81 0.77 0.78 0.79 0.81 |
0.88 0.86 0.86 0.87 0.88 |
0.74 0.71 0.71 0.72 0.75 |
0.87 0.86 0.85 0.86 0.87 |
n/a n/a n/a n/a n/a |
0.87 0.85 0.84 0.86 0.87 |
n/a n/a n/a n/a n/a |
0.85 0.82 0.82 0.83 0.84 |
0.82 0.76 0.74 0.79 0.84 |
n/a n/a n/a n/a n/a |
0.79 0.76 0.76 0.77 0.80 |
0.85 0.84 0.83 0.84 0.85 |
0.83 0.83 0.86 0.84 0.83 |
0.81 0.81 0.85 0.82 0.82 |
ASVAB Standard Errors of Measurement
The standard error of measurement (SEM) provides an alternate way of summarizing the amount of error or inconsistency in test scores. Figure 7 shows how this is computed, where is the observed score standard deviation for test x. If the measurement error is normally distributed and the reported scores are unbiased, then the true scores for approximately 68% of the applicants would fall in the interval created by adding and subtracting one SEM from their reported score.
Figure 7

The SEM of each ASVAB subtest and AFQT score was computed over all FY2009 applicants, and by gender (Male, Female), ethnic group (Hispanic, Non-Hispanic), and race (American-Indian/Alaska Native, Asian, Black/African-American, Native Hawaiian/other Pacific Islander, White/Caucasian). The sample sizes are shown above.
The SEMs for AFQT scores and the subtests that comprise AFQT scores are reported in the table below.
Standard Errors of Measurement for AFQT Scores & AFQT Subtests | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
AFQT | AR | WK | PC | MK | ||||||
Group | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT |
All | 5.89 | 4.28 | 2.98 | 2.35 | 2.91 | 2.26 | 3.80 | 2.84 | 2.96 | 2.02 |
Male Female |
5.93 5.69 |
4.28 4.21 |
3.00 2.79 |
2.33 2.29 |
2.90 2.92 |
2.25 2.23 |
3.88 3.56 |
2.88 2.65 |
3.03 2.76 |
2.04 1.91 |
Hispanic Non-Hispanic |
5.72 5.91 |
4.43 4.17 |
2.80 3.01 |
2.34 2.31 |
2.99 2.90 |
2.49 2.13 |
3.69 3.82 |
2.96 2.76 |
2.87 2.99 |
2.15 1.96 |
American-Indian Asian Black Pacific Islander White |
5.76 6.58 5.39 6.01 5.81 |
3.99 4.87 4.17 4.31 4.18 |
2.86 3.14 2.87 3.17 2.85 |
2.16 2.63 2.33 2.34 2.28 |
2.79 3.78 2.70 3.27 2.80 |
2.01 2.95 2.18 2.32 2.18 |
3.72 4.49 3.56 3.95 3.76 |
2.64 3.11 2.60 2.79 2.82 |
2.91 3.07 2.89 3.20 2.96 |
1.83 2.15 2.04 1.98 1.98 |
The SEMs for the remaining ASVAB subtests are given in the table below. Note that the SEM computations for AI and SI are based on the observed standard deviation of the AS score, since separate scores are not reported for AI and SI.
Standard Errors of Measurement for the Non-AFQT Subtests | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GS | EI | AI | SI | AS | MC | AO | ||||||||
Group | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT | P&P | CAT |
All | 3.86 | 3.04 | 4.81 | 3.37 | n/a | 3.48 | n/a | 3.83 | 4.31 | n/a | 4.07 | 3.37 | 3.38 | 3.43 |
Male Female |
3.85 3.58 |
2.97 2.97 |
4.64 4.35 |
3.22 3.04 |
n/a n/a |
3.24 2.84 |
n/a n/a |
3.65 2.90 |
4.00 3.46 |
n/a n/a |
3.98 3.59 |
3.27 2.89 |
3.43 3.21 |
3.45 3.26 |
Hispanic Non-Hispanic |
3.75 3.86 |
3.19 2.92 |
4.63 4.82 |
3.55 3.25 |
n/a n/a |
3.28 3.42 |
n/a n/a |
3.60 3.77 |
4.02 4.30 |
n/a n/a |
3.79 4.10 |
3.25 3.32 |
3.21 3.41 |
3.26 3.45 |
American-Indian Asian Black Pacific Islander White |
3.67 4.63 3.52 4.05 3.61 |
2.77 3.66 2.97 3.11 2.92 |
4.64 5.54 4.48 4.71 4.46 |
3.08 3.96 3.36 3.51 3.22 |
n/a n/a n/a n/a n/a |
3.17 3.41 3.06 3.39 3.32 |
n/a n/a n/a n/a n/a |
3.49 3.69 3.24 3.69 3.68 |
3.91 3.89 3.68 3.97 3.81 |
n/a n/a n/a n/a n/a |
3.75 4.31 3.61 3.95 3.77 |
3.07 3.49 3.06 3.19 3.27 |
3.20 3.48 3.19 3.20 3.29 |
3.32 3.54 3.29 3.32 3.37 |