Rationale: There is uncertainty regarding how to interpret discordance between tests for latent tuberculosis infection.
Objectives: The objective of this study was to assess discordance between commercially available tests for latent tuberculosis in a low-prevalence population, including the impact of nontuberculous mycobacteria.
Methods: This was a cross-sectional comparison study among 2,017 military recruits at Fort Jackson, South Carolina, from April to June 2009. Several tests were performed simultaneously with a risk factor questionnaire, including (1) QuantiFERON-TB Gold In-Tube test, (2) T-SPOT.TB test, (3) tuberculin skin test, and (4) Battey skin test using purified protein derivative from the Battey bacillus.
Measurements and Main Results: In this low-prevalence population, the specificities of the three commercially available diagnostic tests were not significantly different. Of the 88 subjects with a positive test, only 10 (11.4%) were positive to all three tests; 20 (22.7%) were positive to at least two tests. Bacille Calmette-Guérin vaccination, tuberculosis prevalence in country of birth, and Battey skin test reaction size were associated with tuberculin skin test–positive, IFN-γ release assay–negative test discordance. Increasing agreement between the three tests was associated with epidemiologic criteria indicating risk of infection and with quantitative test results.
Conclusions: For most positive results the three tests identified different people, suggesting that in low-prevalence populations most discordant results are caused by false-positives. False-positive tuberculin skin test reactions associated with reactivity to nontuberculous mycobacteria and bacille Calmette-Guérin vaccination may account for a proportion of test discordance observed.
There is substantial discordance between the tuberculin skin test and IFN-γ release assays in populations with low prevalence of tuberculosis, and most positive results from the three tests identify different people.
This study suggests that most positives from any of these tests are false-positives in low-prevalence populations. To support the current recommendations to treat tuberculosis, targeted testing using risk-stratified interpretation should be used for the IFN-γ release assays as with the tuberculin skin test.
There is continued uncertainty as to which diagnostic test for latent tuberculosis infection (LTBI) is most accurate in the United States population: the tuberculin skin test (TST) or IFN-γ release assays (IGRAs), including the QuantiFERON-TB Gold In-Tube test (QFT-GIT) and T-SPOT.TB test (T-Spot). There is no gold diagnostic standard for evaluating the performance of the IGRAs compared with the TST other than the long-term progression to active TB in cohort studies (1). In the absence of a gold standard, IGRAs are routinely compared in practice with the TST in cross-sectional evaluation studies, using active TB cases to assess sensitivity and low-risk populations to assess specificity (2, 3). In these studies, significant discordance is often found between IGRA and TST results. In a study of Navy recruits, 11 (73%) of 15 of the highest-risk individuals (whose country of birth had a rate of active TB of >100 per 100,000 person-years and who had TST reactions of at least 15 mm) had negative QFT-Gold tests (4). There are several explanations for these discordant results, including the use of region of difference one antigens in the IGRAs, which might result in greater specificity. However, it is also possible that the TST may have greater sensitivity, that the IGRAs may detect only unresolved or more recent infections (5), or that TST and IGRAs provide complementary measures of immune response (6).
Nontuberculous mycobacteria (NTM) may be an important potential source of false-positive tests for Mycobacterium tuberculosis infection in areas where the likelihood of infection is very low (7), such as the southeastern United States (8). The late Dr. George Comstock remarked in 1975 that “the frequency of cross-reactions to tuberculin in this [Navy recruit] population is sufficiently great that the prevalence of true tuberculous infections among white recruits may already be approaching zero” (9). The prevalence of sensitization to NTM in the United States population increased from 11% in 1972 to 17% in 2000 (10). Military recruits are an excellent population to explore NTM sensitization as a potential source of TST/IGRA discordance, because bacille Calmette-Guérin (BCG) and waning sensitivity to TST because of age are uncommon and recruits originate from a wide geographic area.
The impact of cross-reactivity on TST results has been previously investigated by comparing results of skin tests performed with purified protein derivative (PPD) made from M. tuberculosis (PPD-Seibert) and several NTM, including Mycobacterium intracellulare. PPD-Battey (PPD-B) is a skin test antigen made from the Boone strain of M. intracellulare in a manner similar to how PPD-Seibert is made from M. tuberculosis. A skin test performed with PPD-B is referred to as a “Battey skin test” (BST). The BST has been used as an aid in the differentiation of reactivity to M. tuberculosis from reactivity to NTM in Navy recruit (8, 11) and National Health and Nutrition Examination Survey studies (10, 12, 13). It has also been used in many other smaller epidemiologic studies (14–19). The objectives of this study were to compare commercially available tests for LTBI in a heterogeneous, low-LTBI prevalence United States population and to assess the impact of NTM reactivity on test discordance.
After providing written informed consent, recruits originating from all areas of the United States, age 18 years or older, undergoing routine entry-level medical processing at Fort Jackson, South Carolina, were screened for participation in the study. Recruits were excluded from participating if they (1) had a history of severe reaction to the TST, (2) were pregnant by urine human chorionic gonadotropin testing, (3) had received a live virus vaccine within the past 30 days, or (4) had a major viral infection at the time of screening.
PPD-B was used as a skin test antigen under an Investigational New Drug Protocol sponsored by the Uniformed Services University in Bethesda, Maryland. The Infectious Diseases Institutional Review Board at Uniformed Services University provided approval and oversight of the study.
This cross-sectional comparison study among Army recruits at Fort Jackson consisted of five elements: (1) a TB risk factor questionnaire, (2) T-Spot, (3) QFT-GIT, (4) BST, and (5) TST.
The TB risk factor questionnaire contained questions about demographics, TB exposure, work history, location of residence, and other factors shown in Table 1. This questionnaire was developed from the risk factors previously identified in the military and nonmilitary literature (20–25), and other factors considered candidates for causal relationships with LTBI.
Characteristic | Number* | Percent (%) |
Sex | ||
Male | 1,294 | 65.5 |
Female | 681 | 34.5 |
Age (SD) | 1,974 | 21.8 yr (4.6 yr) |
Race or ethnic group† | ||
White | 1,298 | 65.6 |
Black | 459 | 23.2 |
Hispanic | 221 | 11.4 |
Asian/Pacific Islander | 117 | 5.7 |
Other | 63 | 3.2 |
Prevalence of TB in country of birth | ||
<20 per 100,000 | 1,873 | 94.7 |
20–100 per 100,000 | 35 | 1.8 |
>100 per 100,000 | 70 | 3.5 |
BCG vaccinated | 69 | 3.5 |
Greatest prevalence of TB among countries the subject lived in or traveled to for >1 mo | ||
<20 per 100,000 | 1,811 | 91.6 |
20–100 per 100,000 | 62 | 3.1 |
>100 per 100,000 | 105 | 5.3 |
Contact with someone with TB | ||
In same household | 24 | 1.2 |
Casual contact | 73 | 3.7 |
Healthcare work | 232 | 11.7 |
Lived or worked in congregate setting | 120 | 6.1 |
Farm work or residence | 383 | 19.4 |
Current residence | ||
Northeast United States | 337 | 17 |
Southeast United States | 657 | 33.2 |
Western United States | 706 | 35.7 |
Other | 278 | 14.1 |
Smoking | ||
Never | 1,480 | 75.1 |
<1 pack per day | 395 | 20 |
1+ pack per day | 97 | 4.9 |
Education, yr | ||
<12 | 257 | 13 |
12 | 1,095 | 55.4 |
13–15 | 468 | 23.7 |
16+ | 158 | 8 |
Prior TB treatment | 45 | 2.3 |
Prior TB skin test performed | 710 | 35.9 |
Prior positive skin test | 24 | 3.4 (of those with a prior test) |
Unknown result | 54 | 7.6 (of those with a prior test) |
Blood for QFT-GIT and T-Spot was collected at the time of routine phlebotomy for recruit in-processing. Personnel performing IGRAs were masked to all patient data. QFT-GIT was performed according to package insert instructions, including incubation and centrifugation of blood within the prescribed times at Fort Jackson, and completion of ELISAs at the US Air Force School of Aerospace Medicine, Brooks City-Base, Texas, and the Centers for Disease Control and Prevention (CDC), Atlanta, Georgia (26). ELISAs were performed with the aid of Triturus automated ELISA workstations (Grifols USA, Los Angeles, CA). T-Spot was performed per package insert instructions (27) at the Oxford Immunotec, Ltd. Laboratory, Marlborough, Massachusetts, with the addition of T cell Xtend (Oxford Immunotec, Ltd., Oxfordshire, UK) immediately before peripheral blood mononuclear cell recovery. IGRAs were interpreted according to published guidelines (28); however, in the analysis of quantitative responses, borderline T-Spot results (i.e., TB response of five, six, or seven spots) were coded as “negative.”
TST and BST were placed by study personnel after the blood draw. All personnel involved in placement and reading of the skin test were trained and monitored to strictly adhere to standard operating procedures based on published methods for skin test administration and interpretation (20, 29). The Mantoux technique was used to intradermally administer 0.1 ml (5 TU) of Tubersol tuberculin PPD (Sanofi Pasteur Ltd., Toronto, ON, Canada) and 0.1 ml (0.01 μg) of PPD-B at the same sitting. One skin test was placed on each forearm. A random number table for each recruitment day determined which PPD was placed on each arm. The transverse diameter of induration at each skin test site was measured 2 days after PPD injection. Participants and those administering and reading the skin tests were masked to which skin test antigen was administered on each arm.
Recruits were categorized using a risk stratified interpretation (RSI), as previously described by the CDC (30). The only modifications to the CDC criteria were that no time limitations were placed on contact with an active TB case or immigration from a high-prevalence country. The TB prevalence reported by the World Health Organization in 1990 was used to estimate exposure risk by country using groups of (1) less than 20 per 100,000, (2) 20–100 per 100,000, and (3) greater than 100 per 100,000 (4, 31). BCG status was determined by self report. There was a strong correlation between reported history of BCG vaccination, presence of BCG scar, and foreign birth in this population. There was no significant difference in the results when using history of BCG vaccination or BCG scar (data not shown). Test specificity was estimated by assuming that recruits with no risk factors for M. tuberculosis exposure were uninfected. An invalid test was defined as those with insufficient blood, misplaced or dislodged caps, an insufficient number of peripheral blood mononuclear cells recovered, or other laboratory errors. Test discordance was categorized as “TST positive/IGRA negative” or “TST negative/IGRA positive” for the QFT-GIT and the T-Spot. BST induration size was categorized into four 5-mm intervals and one greater than or equal to 20 mm. A dominant BST reaction was defined as a BST reaction of at least 2 mm greater than the TST reaction.
The proportion of recruits with a positive TST, T-Spot, and QFT-GIT were compared using McNemar test for correlated proportions, as were specificity and the proportion of indeterminate and invalid results for each test. The proportions of discordant and concordant results were also measured, and test agreement using kappa (κ) coefficient. Factors associated with discordance were evaluated using standard chi-square bivariate statistics, stratified analyses, and multivariate analysis. Prevalence ratios were directly estimated for both bivariate and multivariate analyses. Because the log-binomial model failed to converge because of numerical instability, Poisson regression with robust variance estimation was used to calculate multivariate prevalence ratios (32). The variables evaluated are listed in Table 1.
Discordance between TST and IGRA was further assessed using associations between demographic and exposure variables including category of BST induration. TST positive/IGRA negative discordance was assessed separately from TST negative/IGRA positive discordance. The comparison group used for both of these analyses was the group of concordant negatives.
Figure 1 depicts subject participation and follow-up in a flow chart. Of the 3,095 recruits approached from April 1 to June 11, 2009, a total of 2,697 were eligible to participate in the study, of which 2,017 subjects (75%) enrolled. Of the 39 recruits who withdrew before blood collection or completion of skin testing, 30 were for administrative reasons unrelated to the study. Characteristics of the remaining 1,978 study participants are shown in Table 1. TST results were available for all of the remaining 1,978 participants, and were read a mean of 45 hours after PPD injection (range, 40–50 h). TST induration was detected in 122 (6.2%) participants and ranged from 2–80 mm. No significant digit preference was identified on inspection of the histogram of reaction size (see online supplement). T-Spot and QFT-GIT results were available for 1,913 (96.7%) and 1,850 (93.5%), respectively. QFT-GIT was invalid for 128 (6.5%) subjects, and 17 (0.9%) of the valid QFT-GIT gave indeterminate results. T-Spot was invalid for 65 (3.3%) subjects, 6 (0.3%) of the valid T-Spots were indeterminate, and 23 (1.2%) had borderline results with a TB response between five and seven spots. The relatively high proportion of subjects with invalid tests was caused by a need for numerous tubes of blood for routine recruit inprocessing and investigational tests, and an institutional review board restriction against additional phlebotomy solely to collect blood for investigational tests.
Of the 1,803 subjects who had valid positive, negative, or borderline results for all three tests, 1,373 were classified as low-risk for M. tuberculosis infection based on history, but 19 of them had borderline T-Spot results. Among the 1,354 recruits without identifiable risks and with determinate results for all three tests, estimates of TST specificity were 99.3% (95% confidence interval [CI], 98.7–99.7) when using the 15-mm cutoff for positive recommended by the CDC for persons at low risk of exposure (30), or 98.6% (95% CI, 97.8–99.2) when using a 10-mm cutoff. The specificity of the IGRAs was 98.7% for the T-Spot (1,336 negatives among 1,354 low-risk recruits; 95% CI, 97.9–99.2), and 98.8% for the QFT-GIT (1,338 negatives among 1,354 low-risk recruits; 95% CI, 98.1–99.3). Estimates of specificity were unchanged when borderline T-Spot results were coded as negative and included in the analysis (data not shown). None of the differences were statistically significant.
There were 1,781 subjects who had valid positive or negative results, excluding subjects with indeterminate or borderline results by any test. Table 2 shows the number and proportion of positive tests by test type, and the prevalence of BST reactions among the positives. An analysis of risk factors for positive tests, such as BCG vaccination and foreign birth, is presented in another recent publication (33). The proportion of subjects with a 10-mm or greater TST reaction was significantly larger than with any other test or TST cutoff (P < 0.05), and the proportion of subjects with a 15-mm or greater TST reaction was significantly smaller than that found by RSI or a 10-mm cutoff (P < 0.0001). None of the other differences in proportions was statistically significant. A total of 19 (33%) of 57 recruits with 10 mm or greater TST reactions did not have identifiable risks for M. tuberculosis infection. When using RSI as suggested by the CDC (30), 2.7% were positive, a similar proportion of positive results as was observed for both the T-Spot (1.9%) and QFT-GIT (2%).
TB Test Type | Number Positive (% of total)* | Number (%) of Positives with BST ≥10 mm | Number (%) of Positives with Dominant BST ≥10 mm† |
TST | |||
≥10 mm | 57 (3.2)‡ | 33 (58) | 12 (21) |
≥15 mm | 25 (1.4)§ | 16 (64) | 3 (12) |
RSI|| | 48 (2.7) | 28 (58) | 9 (19) |
T-Spot | 34 (1.9) | 11 (32) | 4 (12) |
QFT-GIT | 36 (2) | 8 (22) | 3 (8) |
Using the RSI for TST, 88 (4.9%) had a positive result to at least one of the three tests. Of these, only 10 (11.4%) were positive to all three tests; 20 (22.7%) were positive to at least two of the tests. Modest agreement between TST and the two IGRAs was seen in Tables 3–5. In contrast, good agreement was seen with TST when using different blinded readers (kappa = 0.79; see online supplement).
TST Positive* | TST Negative | Total | |
T-Spot positive | 15 (0.8%) | 19 (1.1%) | 34 (1.9%) |
T-Spot negative | 33 (1.9%) | 1,714 (96.2%)† | 1,747 (98.1%) |
Total | 48 (2.7%) | 1,733 (97.3%) | 1,781 |
TST Positive* | TST Negative | Total | |
QFT-GIT positive | 11 (0.6%) | 25 (1.4%) | 36 (2%) |
QFT-GIT negative | 37 (2.1%) | 1,708 (95.9%) | 1,745 (98%) |
Total | 48 (2.7%) | 1,733 (97.3%) | 1,781 |
QFT-GIT Positive | QFT-GIT Negative | Total | |
T-Spot positive | 14 (0.8%) | 20 (1.1%) | 34 (1.9%) |
T-Spot negative | 22 (1.2%) | 1,725 (96.9%) | 1,747 (98.1%) |
Total | 36 (2%) | 1,745 (98%) | 1,781 |
Of the 48 subjects with a positive TST, 9 (18.8%) had a dominant BST reaction, defined as a BST reaction of at least 2-mm greater than the TST, as shown in Table 2. Table 6 further examines the associations of potential risk factors for TST-positive, IGRA-negative discordance. Strong dose–response relationships were observed between discordance and BST reaction size, TB prevalence in country of birth, and BCG vaccination. No significant associations were seen between any variables and IGRA-positive/TST-negative discordance or T-Spot/QFT-GIT discordance (data not shown).
Recruits with a Negative QFT-GIT Result (n = 1,745) | Recruits with a Negative T-Spot Result (n = 1,747) | |||||||
Characteristic | N (of 1,745) | N with TST Positive* (n = 37) | Bivariate Prevalence Ratio (95% CI) | Multivariate Prevalence Ratio (95% CI) | N (of 1,747) | N with TST Positive* (n = 33) | Bivariate Prevalence Ratio (95% CI) | Multivariate Prevalence Ratio (95% CI) |
Age, yr† | — | — | 1.1 (1.1–1.2) | ‡ | — | — | 1.1 (1.1–1.2) | ‡ |
Sex | ‡ | ‡ | ||||||
Male | 1,136 | 21 | 1 (REF) | 1,132 | 19 | 1 (REF) | ||
Female | 607 | 16 | 1.4 (0.7–2.7) | 613 | 14 | 1.4 (0.7–2.7) | ||
Race and ethnic group | ‡ | ‡ | ||||||
White | 1,158 | 11 | 1 (REF) | 1,157 | 11 | 1 (REF) | ||
Black | 397 | 10 | 2.7 (1.1–6.2) | 400 | 10 | 2.6 (1.1–6.1) | ||
Asian/Pacific Islander | 60 | 12 | 21.1 (9.7–45.7) | 59 | 8 | 14.2 (6–34.1) | ||
Hispanic | 194 | 7 | 3.8 (1.5–9.7) | 197 | 7 | 3.7 (1.5–9.5) | ||
TB prevalence in country of birth or long-term residence | ||||||||
<20 per 100,000 | 1,659 | 19 | 1 (REF) | 1 (REF) | 1,662 | 18 | 1 (REF) | 1 (REF) |
20–100 per 100,000 | 30 | 3 | 8.7 (2.7–27.9) | 6.2 (2.1–18.7) | 30 | 3 | 9.2 (2.9–29.7) | 7.9 (2.8–22.5) |
>100 per 100,000 | 56 | 15 | 23.3 (12.6–43.6) | 7.7 (3–20.1) | 55 | 12 | 20.1 (10.2–39.7) | 9.1 (3.9–21.4) |
BCG vaccination | ||||||||
No | 1,690 | 20 | 1 (REF) | 1 (REF) | 1,695 | 19 | 1 (REF) | 1 (REF) |
Yes | 55 | 17 | 26.1 (14.5–47) | 4 (1.6–9.7) | 52 | 14 | 24 (12.8–45.2) | 4.4 (2.1–9.5) |
BST reaction | ||||||||
0–4 mm | 1,417 | 8 | 1 (REF) | 1 (REF) | 1,423 | 7 | 1 (REF) | 1 (REF) |
5–9 mm | 133 | 7 | 9.3 (3.4–25.3) | 5.5 (2.2–13.5) | 132 | 6 | 9.2 (3.2–27.1) | 5.4 (2–14.5) |
10–14 mm | 145 | 11 | 13.4 (5.5–32.9) | 6 (2.3–15.5) | 142 | 9 | 12.9 (4.9–34.1) | 5.2 (1.9–14.2) |
15–19 mm | 34 | 4 | 20.8 (6.6–65.9) | 15.6 (5.1–47.5) | 35 | 5 | 29 (9.7–87) | 14.5 (4.8–43.5) |
20+ mm | 16 | 7 | 77.5 (31.9–188) | 37.5 (11–128) | 15 | 6 | 81.3 (31–213) | 86.1 (32.6–227) |
Region of birth | ‡ | ‡ | ||||||
NE | 295 | 7 | 1 (REF) | 297 | 7 | 1 (REF) | ||
SE | 591 | 9 | 0.6 (0.2–1.7) | 590 | 9 | 0.6 (0.2–1.7) | ||
West | 613 | 10 | 0.7 (0.3–1.8) | 614 | 9 | 0.6 (0.2–1.7) | ||
Other | 246 | 11 | 1.9 (0.7–4.8) | 246 | 8 | 1.4 (0.5–3.8) | ||
Farm work | ‡ | ‡ | ||||||
No | 1,401 | 34 | 1 (REF) | 1,406 | 3 | 1 (REF) | ||
Yes | 344 | 3 | 0.4 (0.1–1.2) | 341 | 30 | 0.4 (0.1–1.3) |
Among the 1,803 subjects with valid tests and determinate results, Table 7 shows the agreement between the three tests by quantitative result of each test. Subjects with borderline T-Spot results were included in this analysis to assess a continuum of TB responses including five to seven spots. This shows an association of increased proportion of greater quantitative test results with increased concordance between the tests. This dose–response relationship was highly significant for all three tests. Table 8 shows the quantitative test results for each test according to risk strata. The association of increasing risk for infection with M. tuberculosis with increasing proportion of IGRA response suggests a similar relationship between the quantitative test results of the IGRAs as is seen with the TST. The dose–response relationship between risk of infection with M. tuberculosis and quantitative test result was also highly significant for each test. Similarly, Table 9 shows the association of higher TB risk strata with greater test concordance; this dose–response relationship was also statistically significant.
Quantitative TST Result | Quantitative QFT-GIT Result | Quantitative T-Spot Result | |||||||||
Test Results | N | 0–4 mm | 5–9 mm | 10–14 mm | ≥15 mm* | <0.35 | 0.35–0.99 | ≥1† | ≥ 4 Spots | 5–7 Spots‡ | ≥8 Spots§ |
All tests negative | 1,713 | 1,676 (97.8%) | 30 (1.8%) | 7 (0.4%) | 0 | 1,713 (100%) | 0 | 0 | 1,693 (98.8%) | 20 (1.2%) | 0 |
One test positive | 69 | 35 (50.7%) | 19 (27.5%) | 14 (20.3%) | 48 (69.6%) | 16 (23.2%) | 5 (7.3%) | 53 (76.8%) | 1 (1.5%) | 15 (21.7%) | |
TST only | 33 | 0 | 18 (54.6%) | 14 (42.4%) | 33 (100%) | 0 | 0 | 32 (97%) | 1 (3%) | 0 | |
QFT-GIT only | 21 | 21 (100%) | 0 | 0 | 0 | 16 (76.2%) | 5 (23.8%) | 21 (100%) | 0 | 0 | |
T-Spot only | 15 | 14 (93.3%) | 1 (6.7%) | 0 | 15 (100%) | 0 | 0 | 0 | 0 | 15 (100%) | |
Two tests positive | 11 | 1 (9.1%) | 1 (9.1%) | 4 (36.4%) | 5 (45.5%) | 5 (45.5%) | 3 (27.3%) | 3 (27.3%) | 1 (9.1%) | 1 (9.1%) | 9 (81.8%) |
TST and QFT-GIT | 2 | 0 | 0 | 1 (50%) | 1 (50%) | 0 | 1 (50%) | 1 (50%) | 1 (50%) | 1 (50%) | 0 |
TST and T-Spot | 5 | 0 | 0 | 1 (20%) | 4 (80%) | 5 (100%) | 0 | 0 | 0 | 0 | 5 (100%) |
QFT-GIT and T-Spot | 4 | 1 (25%) | 1 (25%) | 2 (50%) | 0 | 0 | 2 (50%) | 2 (50%) | 0 | 0 | 4 (100%) |
All three tests positive | 10 | 0 | 0 | 2 (20%) | 8 (80%) | 0 | 3 (30%) | 7 (70%) | 0 | 0 | 10 (100%) |
Quantitative TST Result | Quantitative QFT-GIT Result | Quantitative T-Spot Result | |||||||||
Risk Stratification† | N | 0–4 mm | 5–9 mm | 10–14 mm | ≥15 mm‡ | <0.35 | 0.35–0.99 | ≥1§ | ≤4 Spots | 5–7 Spots | ≥8 Spots|| |
High risk (5-mm criteria) | 21 | 18 (85.7%) | 1 (4.8%) | 1 (4.8%) | 1 (4.8%) | 18 (85.7%) | 2 (9.5%) | 1 (4.8%) | 20 (95.2%) | 0 | 1 (4.8%) |
Moderate risk (10-mm criteria) | 409 | 362 (88.5%) | 10 (2.4%) | 21 (5.1%) | 16 (3.9%) | 392 (95.8%) | 7 (1.7%) | 10 (2.4%) | 391 (95.6%) | 3 (0.7%) | 15 (3.7%) |
Low risk (15-mm criteria) | 1,373 | 1,332 (97%) | 21 (1.5%) | 10 (0.7%) | 10 (0.7%) | 1,356 (98.8%) | 13 (1%) | 4 (0.3%) | 1,336 (97.3%) | 19 (1.4%) | 18 (1.3%) |
Test Results | N | High Risk (5-mm criteria) | Moderate Risk (10-mm criteria) | Low Risk (15-mm criteria)† |
All tests negative | 1,693 | 16 (1%) | 359 (21.2%) | 1,318 (77.9%) |
One test positive | 68 | 4 (5.9%) | 33 (48.5%) | 31 (45.6%) |
TST only | 32 | 2 (6.3%) | 23 (71.9%) | 7 (21.9%) |
QFT-GIT only | 21 | 2 (9.5%) | 8 (38.1%) | 11 (52.4%) |
T-Spot only | 15 | 0 (0%) | 2 (13.3%) | 13 (86.7%) |
Two tests positive | 10 | 0 (0%) | 7 (70%) | 3 (30%) |
TST and QFT-GIT | 1 | 0 (0%) | 1 (100%) | 0 (0%) |
TST and T-Spot | 5 | 0 (0%) | 5 (100%) | 0 (0%) |
QFT-GIT and T-Spot | 4 | 0 (0%) | 1 (25%) | 3 (75%) |
All three tests positive | 10 | 1 (10%) | 7 (70%) | 2 (20%) |
This study suggests that the three commercially available TB diagnostics have similar results in United States populations with low TB prevalence. IGRAs were designed to increase specificity, but in this study specificity for the IGRAs was no better than TST specificity among low-risk recruits when interpreted using a TST cutoff of 15 mm according to published guidelines. The prevalence of positive results and dose–response relationships with TB exposure were also similar for the three tests. Despite these areas of agreement, the three tests identified different people for most positive test results. In this trial, TST-positive, IGRA-negative discordance was strongly associated with BST results, supporting other evidence that NTM sensitization can cause false-positive TST results. Conversely, the IGRAs showed little evidence of cross-reactivity to NTM by the BST. Although this suggests that NTM and BCG sensitization cause false-positive TST results and that this contributes to discordance, these factors do not explain the etiology of most of the discordance encountered.
Other aspects of test discordance examined in this study include the dose–response associations seen between the TB exposure risk, quantitative results of the TST and IGRA testing, and degree of concordance between the three tests. These data suggest that in low-prevalence populations, most positives resulting from any of the three commercially available diagnostic tests are false-positives because (1) 77% of subjects with positive test results were positive by only one test, (2) lower quantitative results were associated with smaller risk for TB exposure, (3) lower quantitative results were associated with single positive tests, and (4) lower risk for TB exposure was associated with decreasing test agreement.
The problem of low positive predictive value is well known and understood with the TST (34). Use of risk stratification is currently recommended to guide the interpretation of the TST as a way to increase positive predictive value and reduce false positivity (30); this is not used for the IGRAs. This study suggests that performance of the IGRAs would also benefit from the use of a risk-stratified interpretation, because it would increase positive predictive value and reduce the number of false-positives. These findings support the CDC's recommendation that people at minimal risk of infection (who are at greatest risk of a false-positive result) should not be targeted for LTBI testing, regardless of whether a TST or IGRA is used (35).
This study provides reliable estimates of specificity in a low-risk population. Although both IGRAs are generally reported to have specificity higher than the TST (2), there was surprisingly little difference in specificity between TST and either IGRA seen in this study. The specificity estimates for TST and IGRA found in this study are similar to those found in previous studies of Navy recruits (4). Although the specificity of QFT-GIT is sometimes thought to be higher than that of T-Spot (2, 3), the estimated specificities of the two tests were not different in this study. The strong dose–response relationships between TB exposure and positive TST and IGRA results were also similar to those reported previously (2, 3). These findings further support the CDC's recommendation that IGRAs may be used in place of the TST, but that testing should be targeted to avoid false-positive results (35).
Although IGRAs and TST may be used in the diagnosis of LTBI, they do not give equivalent information and often have discordant results. Several studies have compared results from different IGRAs and from TST “head-to-head” (28, 36–42), and although the agreement between QFT-GIT and T-Spot has generally been very good, discordant results between the IGRA and TST have been found in up to 20–30% of subjects (3). The magnitude of discordance is demonstrated in this study by the low kappa values and the high proportion of discordance seen among positives, because 68 (77%) of 88 individuals with at least one positive test were positive to only one of the three tests. The frequency of test discordance has varied among studies, leading some authors to conclude that the IGRAs have lower sensitivity (36), whereas others have concluded that the IGRAs have better specificity because of less cross-reactivity with BCG vaccine and to waning sensitivity because of age (28). The differences may also be caused by differences in the populations studied.
A few studies have provided evidence that NTM contribute to discordance between the TST and IGRA (4, 39), but none have used the BST. In this study, the strong dose–response relationship between increasing BST reaction size and increasing prevalence of discordance provide additional evidence that false-positive TSTs contribute to this discordance. BCG vaccination was also strongly associated with discordance in this study. However, risk-stratified TST-positive, IGRA-negative discordance was also associated with TB prevalence in country of birth and being Asian or from the Pacific Islands, traditionally factors associated with high risk of developing disease if infected. Thus, some of the discordance also may be attributable to lower sensitivity of the IGRAs compared with TST, or a combination of these two factors.
A limitation of this study is the lack of a gold standard for determining the presence of M. tuberculosis infection, making it difficult to assess the true significance of discordance between TST and IGRAs. The significance of reactivity to BST also has some uncertainty. Although it has previously been shown to assist in differentiating between LTBI and cross-reactivity caused by NTM (8, 11), BST reactivity also may be caused by cross-reactivity after M. tuberculosis infection (16, 43). Furthermore, there are other mycobacteria that contain region of difference one antigens, such as M. kansasii, M. szulgai, or M. marinum; infection with these NTM may cause false-positive reactions to TST and IGRAs (2, 44). There is potential for misclassification of several variables, including the recall of BCG vaccination among recruits, history of prior TB or LTBI diagnosis or treatment, and contact with a TB case. Although samples were sent masked to all participating laboratories, the potential still exists for other residual sources of misclassification bias. Recruits are a low-risk population and may not represent the causes of test discordance in other higher-risk populations. Furthermore, because this research was performed in the high-throughput basic training setting, the administrative limitations imposed resulted in larger proportions of inadequate blood draws and TST reading times, which were slightly shorter than optimal.
This study highlights the need for better understanding of the significance of test discordance, particularly the need for longitudinal data on progression to active TB among those with discordant test results. Applying the methodology used in this study to other populations (11, 12) may provide a more complete understanding of the test interpretation and test discordance. Finally, further research is needed to better characterize the most appropriate cutoffs to be used for the risk-stratified interpretation of the IGRAs, to maximize sensitivity and specificity in different risk groups and populations.
The authors thank Christine Anderson, Ph.D., (Food and Drug Administration) for graciously supplying her expertise in preparing the Battey antigen and testing it for human use. She also provided valuable comments on the manuscript during preparation. This study was greatly assisted by the incredible energy and expertise of Ms. Carey Schlett of the Infectious Disease Clinical Research Program. Her guidance and constant supervision were invaluable to the completion of the study. The authors also thank Dr. Richard Menzies, who provided invaluable advice and expertise in designing and setting up this study.
1. | Pai M, Menzies D. The new IGRA and the old TST: making good use of disagreement. Am J Respir Crit Care Med 2007;175:529–531. |
2. | Pai M, Zwerling A, Menzies D. Systematic review: T-cell-based assays for the diagnosis of latent tuberculosis infection: an update. Ann Intern Med 2008;149:177–184. |
3. | Menzies D, Pai M, Comstock G. Meta-analysis: new tests for the diagnosis of latent tuberculosis infection: areas of uncertainty and recommendations for research. Ann Intern Med 2007;146:340–354. |
4. | Mazurek GH, Zajdowicz MJ, Hankinson AL, Costigan DJ, Toney SR, Rothel JS, Daniels LJ, Pascual FB, Shang N, Keep LW, et al.. Detection of Mycobacterium tuberculosis infection in United States Navy recruits using the tuberculin skin test or whole-blood interferon-gamma release assays. Clin Infect Dis 2007;45:826–836. |
5. | Pai M, Kalantri S, Menzies D. Discordance between tuberculin skin test and interferon-gamma assays. Int J Tuberc Lung Dis 2006;10:942–943. |
6. | Gallant CJ, Cobat A, Simkin L, Black GF, Stanley K, Hughes J, et al.. Tuberculin skin test and in-vitro assays provide complementary measures of anti-mycobacterial immunity in children and adolescents. Chest 2009;137:1071–1077. |
7. | Cobelens FG, Menzies D, Farhat M. False-positive tuberculin reactions due to non-tuberculous mycobacterial infections. Int J Tuberc Lung Dis 2007;11:934–935, author reply 5. |
8. | Edwards LB, Acquaviva FA, Livesay VT, Cross FW, Palmer CE. An atlas of sensitivity to tuberculin, PPD-B, and histoplasmin in the United States. Am Rev Respir Dis 1969;99(Suppl 4):1–132. |
9. | Comstock GW. Frost revisited: the modern epidemiology of tuberculosis. Am J Epidemiol 1975;101:363–382. |
10. | Khan K, Wang J, Marras TK. Nontuberculous mycobacterial sensitization in the United States: national trends over three decades. Am J Respir Crit Care Med 2007;176:306–313. |
11. | Edwards LB, Acquaviva FA, Livesay VT. Identification of tuberculous infected: dual tests and density of reaction. Am Rev Respir Dis 1973;108:1334–1339. |
12. | Bennett DE, Courval JM, Onorato I, Agerton T, Gibson JD, Lambert L, McQuillan GM, Lewis B, Navin TR, Castro KG. Prevalence of tuberculosis infection in the United States population: the National Health and Nutrition Examination Survey, 1999–2000. Am J Respir Crit Care Med 2008;177:348–355. |
13. | Engel A, Roberts J. Tuberculin skin test reactions among adults 25–74 years, United States, 1971–72. Washington, DC: US Department of Health, Education, and Welfare; 1977. Report No.: DHEW publication number (HRA) 77–1649. |
14. | Shah SS, McGowan JP, Klein RS, Converse PJ, Blum S, Gourevitch MN. Agreement between Mantoux skin testing and QuantiFERON-TB assay using dual mycobacterial antigens in current and former injection drug users. Med Sci Monit 2006;12:MT11–MT16. |
15. | Shigeto E, Tasaka H. Tuberculin sensitivity to purified protein derivatives (PPD) from M. intracellulare (PPD-B), M. kansasii (PPD-Y), M. fortuitum (PPD-Y) and M. tuberculosis (PPDs) among healthy volunteers. Kekkaku 1993;68:283–291. |
16. | Huebner RE, Schein MF, Cauthen GM, Geiter LJ, Selin MJ, Good RC, O'Brien RJ. Evaluation of the clinical usefulness of mycobacterial skin test antigens in adults with pulmonary mycobacterioses. Am Rev Respir Dis 1992;145:1160–1166. |
17. | Margileth AM, Longfield JN, Golden SM, Lazoritz S, Bohan JS. Tuberculin skin tests: atypical mycobacterial PPD-Battey skin test conversion following airborne training. Mil Med 1986;151:636–638. |
18. | Margileth AM. The use of purified protein derivative mycobacterial skin test antigens in children and adolescents: purified protein derivative skin test results correlated with mycobacterial isolates. Pediatr Infect Dis 1983;2:225–231. |
19. | Larrabee WF, Talarera R. Tuberculin dual testing in Panama. Tubercle 1980;61:239–243. |
20. | CDC. National Health and Nutrition Examination Survey: tuberculosis skin test procedures manual. c2000 [accessed 2008 November 4]. Available from: http://www.cdc.gov/nchs/data/nhanes/tb.pdf. |
21. | Edwards LB, Palmer CE. Part II. Tuberculous infection. In: Lowell AM, editor. Tuberculosis. Cambridge, MA: Harvard University Press; 1969. pp. 123–204. |
22. | Koppaka VR, Harvey E, Mertz B, Johnson BA. Risk factors associated with tuberculin skin test positivity among university students and the use of such factors in the development of a targeted screening program. Clin Infect Dis 2003;36:599–607. |
23. | Lobato MN, Hopewell PC. Mycobacterium tuberculosis infection after travel to or contact with visitors from countries with a high prevalence of tuberculosis. Am J Respir Crit Care Med 1998;158:1871–1875. |
24. | Froehlich H, Ackerson LM, Morozumi PA. Targeted testing of children for tuberculosis: validation of a risk assessment questionnaire. Pediatrics 2001;107:E54. |
25. | Ozuah PO, Ozuah TP, Stein RE, Burton W, Mulvihill M. Evaluation of a risk assessment questionnaire used to target tuberculin skin testing in children. JAMA 2001;285:451–453. |
26. | Cellestis. Quantiferon-TB Gold (in tube method) package insert. Valencia, CA; 2007. |
27. | Oxford Immunotec. T-SPOT.TB package insert. Marlborough, MA; 2008. |
28. | Nienhaus A, Schablon A, Diel R. Interferon-gamma release assay for the diagnosis of latent TB infection: analysis of discordant results, when compared to the tuberculin skin test. PLoS ONE 2008;3:e2665. |
29. | CDC. Mantoux tuberculin skin test: facilitator guide [accessed 2008 November 4]. Available from: http://www.cdc.gov/TB/pubs/Mantoux/images/Mantoux.pdf. |
30. | CDC. Targeted tuberculin testing and treatment of latent tuberculosis infection. American Thoracic Society. MMWR Recomm Rep 2000;49:1–51. |
31. | World Health Organization. Global Tuberculosis Database. Geneva, Switzerland [updated 2009 March 24; accessed 2009 December 27]. Available from: http://www.who.int/tb/country/global_tb_database/en/. |
32. | Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol 2005;162:199–200. |
33. | Mancuso JD, Tribble D, Mazurek GH, Li Y, Olsen C, Aronson NE, Geiter L, Goodwin D, Keep LW. Impact of targeted testing for latent tuberculosis infection using commercially available diagnostics. Clin Infect Dis 2011;53:234–244. |
34. | Huebner RE, Schein MF, Bass JB. The tuberculin skin test [review]. Clin Infect Dis 1993;17:968–975. |
35. | Mazurek GH, Jereb J, Vernon A, LoBue P, Goldberg S, Castro K. Updated guidelines for using interferon gamma release assays to detect Mycobacterium tuberculosis infection—United States, 2010 [practice guideline]. MMWR Recomm Rep 2010;59:1–25. |
36. | Arend SM, Thijsen SF, Leyten EM, Bouwman JJ, Franken WP, Koster BF, Cobelens FG, van Houte AJ, Bossink AW. Comparison of two interferon-gamma assays and tuberculin skin test for tracing tuberculosis contacts. Am J Respir Crit Care Med 2007;175:618–627. |
37. | Leyten EM, Arend SM, Prins C, Cobelens FG, Ottenhoff TH, van Dissel JT. Discrepancy between Mycobacterium tuberculosis-specific gamma interferon release assays using short and prolonged in vitro incubation. Clin Vaccine Immunol 2007;14:880–885. |
38. | Ferrara G, Losi M, D'Amico R, Roversi P, Piro R, Meacci M, Meccugni B, Dori IM, Andreani A, Bergamini BM, et al.. Use in routine clinical practice of two commercial blood tests for diagnosis of infection with Mycobacterium tuberculosis: a prospective study. Lancet 2006;367:1328–1334. |
39. | Detjen AK, Keil T, Roll S, Hauer B, Mauch H, Wahn U, Magdorf K. Interferon-gamma release assays improve the diagnosis of tuberculosis and nontuberculous mycobacterial disease in children in a country with a low incidence of tuberculosis. Clin Infect Dis 2007;45:322–328. |
40. | Adetifa IM, Lugos MD, Hammond A, Jeffries D, Donkor S, Adegbola RA, Hill PC. Comparison of two interferon gamma release assays in the diagnosis of Mycobacterium tuberculosis infection and disease in The Gambia. BMC Infect Dis 2007;7:122. |
41. | Connell TG, Ritz N, Paxton GA, Buttery JP, Curtis N, Ranganathan SC. A three-way comparison of tuberculin skin testing, QuantiFERON-TB gold and T-SPOT.TB in children. PLoS ONE 2008;3:e2624. |
42. | Lee JY, Choi HJ, Park IN, Hong SB, Oh YM, Lim CM, Lee SD, Koh Y, Kim WS, Kim DS, et al.. Comparison of two commercial interferon-gamma assays for diagnosing Mycobacterium tuberculosis infection. Eur Respir J 2006;28:24–30. |
43. | Huebner RE, Schein MF, Cauthen GM, Geiter LJ, O'Brien RJ. Usefulness of skin testing with mycobacterial antigens in children with cervical lymphadenopathy. Pediatr Infect Dis J 1992;11:450–456. |
44. | Andersen P, Munk ME, Pollock JM, Doherty TM. Specific immune-based diagnosis of tuberculosis. Lancet 2000;356:1099–1104. |
Supported by the Infectious Disease Clinical Research Program, a Department of Defense program executed through the Uniformed Services University of the Health Sciences. This project has been funded in whole, or in part, with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health (NIH), under Inter-Agency Agreement Y1-AI-5072. The content of this publication is the sole responsibility of the authors and does not necessarily reflect the views or policies of the NIH, the Department of Health and Human Services, the Centers for Disease Control and Prevention, the Department of Defense, or the Departments of the Army, Navy, or Air Force. Mention of trade names, commercial products, or organizations does not imply endorsement by the US Government. The Infectious Disease Clinical Research Program (Bethesda, MD) participated in all phases of the study, including design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript. Oxford Immunotec (Marlborough, MA) performed T-Spot testing (masked) as an in-kind contribution, but played no other role in the design, conduct, collection, management, analysis, interpretation of the data, preparation, review, or approval of the manuscript. Laboratory and technical support was also provided by the US Air Force School of Aerospace Medicine and the Centers for Disease Control and Prevention's Division of TB Elimination.
Author Contributions: J.D.M., D.T., G.H.M., C.O., N.E.A., L.G., D.G., and L.W.K. all had substantial participation in conception and design of the study, acquisition or analysis of data, interpretation of the data, and revision of the article.
This article has an online supplement, which is accessible from this issue's table of contents at www.atsjournals.org
Originally Published in Press as DOI: 10.1164/rccm.201107-1244OC on December 8, 2011
Author disclosures