ASVAB scoring is based on an Item Response Theory (IRT) model. IRT is a theory that enables test questions and examinee abilities to be placed on the same scale, thereby allowing tests to be tailored to the specific ability level of each examinee and scores to be expressed on the same scale regardless of the combination of items that are taken.
The IRT model underlying ASVAB scoring is the three-parameter logistic (3PL) model. The 3PL model represents the probability that an examinee at a given level of ability will respond correctly to an individual item with given characteristics. Specifically, the item characteristics represented in the 3PL model are difficulty, discrimination (i.e., how well the item discriminates among examinees of differing levels of ability), and guessing (i.e., the likelihood that a very low ability examinee would respond correctly simply by guessing).
Paper-and-Pencil & Computerized Adaptive Testing
For both the paper-and-pencil (P&P) and computerized adaptive testing (CAT) versions of the ASVAB, the 3PL model is used to compute final ability estimates for examinees.
For the CAT-ASVAB the 3PL model is also used to select items. When a CAT-ASVAB session is started, every examinee is assigned an initial ability estimate of = 0.0, which is the mean of the expected distribution of examinee abilities. After each new item is administered, the scored response is used to update the ability estimate. A sequential Bayesian procedure is used for this purpose. When the test is completed (or the time limit exceeded), a final ability estimate is computed as the mode of the posterior distribution (Bayesian modal estimate).
Incomplete tests are handled differently for the P&P and CAT ASVAB versions. For the P&P-ASVAB, any unanswered items are treated as incorrect. For CAT-ASVAB examinees who do not complete the test before the time limit is exceeded, a penalty function is applied to their final ability estimate.
The penalty function has the following properties:
- The size of the penalty is related to the number of unfinished items.
- Examinees who answer the same number of items and have the same ability estimate receive the same penalty.
- The penalty eliminates the possibility of using “coachable” test-taking strategies to artificially increase test scores.
The final ability estimate computed using the penalty procedure is equivalent to the score that would be obtained if the examinee guessed at random on the unfinished items.
After the final ability estimate is computed, it is converted to a standard score on the ASVAB score scale that has been statistically linked to the ability estimate through a process called equating. Equating studies are conducted for every CAT-ASVAB item pool (and for every paper-and-pencil ASVAB form) to ensure that scores have the same meaning regardless of which item pool or test form the examinee receives.
Standard Scores are scores that have a fixed mean and standard deviation in the population of examinees.
A Standard Score indicates how many units of the standard deviation a particular score is above or below the mean. In the case of the ASVAB subtests, the mean is set to 50 and the standard deviation is set to 10. Thus, a Standard Score of 40 indicates that the examinee scored 1 standard deviation below the mean. A Standard Score of 70 indicates that the examinee scored 2 standard deviations above the mean. To learn more about how standard scores are derived and used, download PDF.
Armed Forces Qualification Test
Examinees also receive a score on what is called the Armed Forces Qualification Test (AFQT).
AFQT scores are computed using the Standard Scores from four ASVAB subtests:
- Arithmetic Reasoning (AR)
- Mathematics Knowledge (MK)
- Paragraph Comprehension (PC)
- Word Knowledge (WK)
AFQT scores are reported as percentiles between 1-99. An AFQT percentile score indicates the percentage of examinees in a reference group that scored at or below that particular score. For current AFQT scores, the reference group is a sample of 18 to 23 year old youth who took the ASVAB as part of a national norming study conducted in 1997. Thus, an AFQT score of 90 indicates that the examinee scored as well as or better than 90% of the nationally-representative sample of 18 to 23 year old youth. An AFQT score of 50 indicates that the examinee scored as well as or better than 50% of the nationally-representative sample.
AFQT scores are divided into categories, as shown in the table below.
|AFQT Category||Score Range|
|I||93 – 99|
|II||65 – 92|
|IIIA||50 – 64|
|IIIB||31 – 49|
|IVA||21 – 30|
|IVB||16 – 20|
|IVC||10 – 15|
|V||1 – 9|
ASVAB scores are used primarily to determine enlistment eligibility, assign applicants to military jobs, and aid students in career exploration.
Learn more about Norming for the ASVAB.