Annals of the American Thoracic Society

Background: Usual interstitial pneumonia (UIP) is the histopathologic hallmark of idiopathic pulmonary fibrosis (IPF), the prototypical interstitial lung disease (ILD). Diagnosis of IPF requires that a typical UIP pattern be identified by using high-resolution chest computed tomography or lung sampling. A genomic classifier for UIP has been developed to predict histopathologic UIP by using lung samples obtained through bronchoscopy.

Objective: To perform a systematic review to evaluate genomic classifier testing in the detection of histopathologic UIP to inform new American Thoracic Society, European Respiratory Society, Japanese Respiratory Society, and Asociación Latinoamericana del Tórax guidelines.

Data Sources: Medline, Embase, and the Cochrane Central Register of Controlled Trials were searched through June 2020.

Data Extraction: Data were extracted from studies that enrolled patients with ILD and reported the use of genomic classifier testing.

Synthesis: Data were aggregated across studies via meta-analysis. The quality of the evidence was appraised by using the Grading of Recommendations, Assessment, Development, and Evaluation approach.

Results: Genomic classifier testing had a sensitivity of 68% (95% confidence interval [CI], 55–73%) and a specificity of 92% (95% CI, 81–95%) in predicting the UIP pattern in ILD. Confidence in an IPF diagnosis increased from 43% to 93% in one cohort and from 59% to 89% in another cohort. Agreement levels in categorical IPF and non-IPF diagnoses measured by using a concordance coefficient were 0.75 and 0.64 in the two cohorts. The quality of evidence was moderate for test characteristics and very low for both confidence and agreement.

Conclusions: Genomic classifier testing predicts histopathologic UIP in patients with ILD with a specificity of 92% and improves diagnostic confidence; however, sensitivity is only 68%, and testing is not widely available.

Diagnosis of interstitial lung disease (ILD) requires a multidisciplinary discussion (MDD) to reach a consensus diagnosis (1). Histopathology plays a major role in the MDD. For patients requiring biopsy, different modalities are available for obtaining histopathologic specimens, including surgical lung biopsy (SLB), transbronchial forceps biopsy, and transbronchial lung cryobiopsy (TBLC).

Genomic classifiers have been recently introduced for clinical use (2, 3). Lung tissue obtained by transbronchial forceps biopsy undergoes whole-transcriptome RNA sequencing (RNAseq) followed by gene expression analysis. The pattern of gene expression is subjected to machine learning, and then a diagnostic feature is categorized as present or absent. The Envisia classifier distinguishes usual interstitial pneumonia (UIP) from non-UIP histopathology, helping clinicians and multidisciplinary teams make a diagnosis of idiopathic pulmonary fibrosis (IPF) in patients without a definite UIP pattern at chest imaging.

The American Thoracic Society released clinical practice guidelines on the diagnosis of IPF in collaboration with the European Respiratory Society, Japanese Respiratory Society, and Asociación Latinoamericana del Tórax in 2018 (1). This systematic review was performed to summarize evidence related to the role of genomic classifier testing in the diagnosis of ILD to inform an update of these guidelines (4).

This review was performed in accordance with the guidance provided by the Cochrane Handbook for Systemic Reviews of Interventions (5). It was registered with the PROSPERO database (CRD 42020208985).

Research Question

The research question was formulated by using the population, intervention, comparator, and outcome format: “Should patients with newly detected ILD of unknown cause undergo transbronchial forceps biopsy for genomic classifier testing to make a molecular diagnosis of the UIP pattern?”

Literature Search

Potentially relevant systematic reviews were initially sought in the Cochrane Library and Medline. There were no systematic reviews addressing the proposed question. A search strategy was therefore developed to identify studies that enrolled patients with ILD and evaluated the use of a genomic classifier as a diagnostic tool.

Because of concerns about finding enough relevant direct evidence, the search strategy was designed to be broad enough to capture both direct and indirect evidence (Table E1). Medline and Embase were initially searched on the Ovid platform for studies between January 2016 and December 2020. The January 2016 start date was selected by the guideline chair as the date after which studies pertinent to contemporary genomic classifier testing began to be published. Supplementary searches were undertaken on, and guideline committee members were asked for potentially relevant studies. The search results were collected in a bibliographic database and then distributed to the methodology team.

Study Selection

The criteria for study selection were prespecified. A priori study selection criteria included 1) enrolled patients with ILD of an uncertain type, 2) evaluated the use of a genomic classifier, and 3) reported diagnostic test characteristics (sensitivity, specificity, positive predictive value, negative predictive value), agreement (percentage, kappa coefficient), and/or diagnostic confidence (before and after the genomic classifier is performed). Two methodologists (F.K. and J.P.U.B.) independently used a stepwise approach to screen the search results. They initially screened the titles and abstracts and then screened the full text. Any studies found to be potentially relevant on the basis of the title and abstract underwent full-text review. In addition to the stepwise approach, bibliographies of selected studies, systematic reviews, and review articles were reviewed for relevant studies. Studies not fulfilling the inclusion criteria were excluded, such as case series with <10 patients, case reports, animal studies, and abstracts. Disagreements were resolved with the input of a third reviewer (K.C.W.) to develop a consensus.

Data Extraction

Data from the selected studies were extracted into a Microsoft Excel spreadsheet developed specifically for this systematic review. The extracted information included the study setting, design and location, number of participants and their characteristics, intervention details, test characteristics, agreement measures, and diagnostic confidence. Information necessary to judge the risk of bias in each study was also extracted. This included whether true diagnostic uncertainty among the participants (selection bias), consecutive enrollment (selection bias), use of a valid reference standard (verification bias), and reporting of all outcomes (publication bias) were present (6).

Data Synthesis

Data amenable to weighted aggregation (i.e., meta-analysis) were analyzed by using a diagnostic test accuracy tool in the Cochrane Collaboration Review Manager, version 5.4.1, software. Meta-analyses were performed to estimate the sensitivity and specificity by which genomic classifier testing distinguished subjects with UIP and both a forest plot and a summary receiver operator curve were created. A bivariate analysis of the summary receiver operator curve was performed by using Excel to derive the aggregate estimate of sensitivity and specificity. The 95% confidence interval (CI) was calculated for all summary estimates. Agreement measures (percentage and kappa coefficient) and diagnostic confidence were not amenable to meta-analysis and were reported as in the original manuscripts.

Quality of Evidence

The quality of evidence (confidence in estimated effects) was rated by using the Grading of Recommendations, Assessment, Development, and Evaluation approach (7). A baseline assumption of the quality of evidence was made (high quality for accuracy studies if they enrolled consecutive patients with true diagnostic uncertainty and used a valid reference standard, and low quality for observational studies). After the baseline assumption, five reasons to downgrade the quality of evidence were sought: risk of bias (internal validity), inconsistency of estimates across studies (heterogeneity), indirectness (external validity), imprecision of estimates (wide CIs), and likelihood of publication bias (6). Because the selected studies were diagnostic studies, the QUADAS-2 (University of Bristol) instrument was modified and used to assess the risk of bias for each study.

The search of electronic bibliographic databases, combined with recommended studies from the guideline panel, yielded 371 potentially relevant studies for the use of a genomic classifier in the diagnosis of ILD. After the screening of titles and abstracts, the full text of 20 articles were reviewed, from which three studies were selected. One additional article was added through manual searching, resulting in a total of four studies being included in the analysis (3, 810) (Figure 1).

The studies included a total of 200 patients. The majority of patients were male, were former or current smokers, and had a mean age in the early to mid-60s. All of the studies were accuracy studies that enrolled patients with ILD of an uncertain type, performed genomic classifier testing, and reported diagnostic test characteristics by using either histopathology alone or the composite of clinical, radiographic, and histopathologic data applied by MDD as the reference standard (3, 810). Two of the studies also measured agreement in the categorization of UIP and non-UIP when a genomic classifier was or was not used (310). Those same two studies compared diagnostic confidence before and after use of a genomic classifier (Table 1).

Table 1. Characteristics of selected studies

AuthorYearLocationFundingDesignPopulation*NInterventionReference StandardOutcomesRisk of Bias
Kheir2019U.S.NoneAccuracyPatients with suspected ILD24Envisia Genomic ClassifierMultidisciplinary discussionTest characteristics, agreement, and diagnostic confidenceNone
Pankratz2017U.S.Yes, VeracyteAccuracyPatients with suspected ILD31Envisia Genomic ClassifierHistopathologyTest characteristicsNone
Raghu2019U.S. and EuropeYes, VeracyteAccuracyPatients with suspected ILD237 (49 included in the analysis)Envisia Genomic ClassifierMultidisciplinary discussionTest characteristics, agreement, and diagnostic confidenceNone
Richeldi2020U.S. and EuropeYes, VeracyteAccuracyPatients with suspected ILD96Envisia Genomic ClassifierMultidisciplinary discussionTest characteristicsNone

Definition of abbreviation: ILD = interstitial lung disease.

*See Table E2 for additional details about the patient populations.

Diagnostic Test Accuracy

Four studies (200 patients) reported diagnostic test characteristics and were included in the meta-analysis (3, 810). The individual studies reported a sensitivity ranging from 59% to 80% and a specificity ranging from 78% to 100% (Figure 2). Reference standards for three studies were the composite of clinical, radiographic, and histopathologic data applied by MDD (3, 9, 10), and for one study, the reference standard was histopathology alone (8). When aggregated by meta-analysis, genomic classifier testing identified the UIP pattern with a sensitivity and specificity of 68% (CI, 55–73%) and 92% (CI, 81–95%), respectively, in patients with ILD (Figure 3). Sensitivity and specificity did not vary in association with the reference standard. The quality of the evidence was rated as low. There were well-done accuracy studies that were downgraded because there was imprecision, the maker of the diagnostic test funded three of the studies, and many of the individuals who developed the diagnostic test also conducted the studies (confirmation bias) (Table 2).

Agreement and Diagnostic Confidence

Two studies reported diagnostic agreement and confidence. In these studies, multidisciplinary teams evaluated anonymized clinical information, radiologic results, and either molecular classifier or histopathologic results to categorize patients as having a UIP pattern or a non-UIP pattern. The studies then measured agreement of the categorizations obtained with and without genomic classifier testing, as well as diagnostic confidence before and after the use of genomic classifier data (3, 10). Raghu and colleagues (3) reported agreement of 86% (95% CI, 78–92% [κ = 0.64; 95% CI, 0.46–0.82]) between categorical IPF or non-IPF clinical diagnoses made with the molecular classifier results and made with histopathology. There was an increase in diagnostic confidence from 56% to 89% after incorporation of genomic classifier data. Kheir and colleagues (10) reported an agreement level of 88% (95% CI, 67–97% [κ = 0.75; 95% CI, 0.48–1.00]) between categorical IPF or non-IPF clinical diagnoses made by MDD with and without genomic classifier results. There was an increase in the diagnostic confidence from 43% to 93% when genomic classifier results were considered. The quality of the evidence was rated as very low (i.e., observational studies downgraded because of imprecision) (Table 2).

Table 2. Evidence profile: use of a genomic classifier versus not using a genomic classifier

Quality AssessmentSummary of FindingsQualityImportance
No. of StudiesDesignRisk of BiasInconsistencyIndirectnessImprecisionOtherNo. of PatientsEffect (95% CI)
Accuracy (sensitivity and specificity)
4*AccuracyNoneNoneNoneSeriousSerious200Sensitivity = 68% (55–73%)⊕⊕○○; LowCritical
Specificity = 92% (81–95%)
Agreement (kappa coefficient)
2§ObservationalNoneNoneNoneSeriousSerious73Cohort 1 = 0.64 (0.46–0.82)⊕○○○; Very lowImportant
Cohort 2 = 0.75 (0.48–1.00)**
Diagnostic confidence (%)
2§ObservationalNoneNoneNoneSeriousSerious73Cohort 1 = 56% vs. 89%⊕○○○; Very lowImportant
Cohort 2 = 43–93%**

Definition of abbreviation: CI = confidence interval.

*See References 3, 8, 9, and 10.

Small sample size (i.e., “suboptimal information size”).

The studies were funded by the maker of the diagnostic test, and the individuals who developed the diagnostic test also conducted the study (confirmation bias).

§See References 3 and 10.

See Reference 3.

Slight agreement, 0.01–0.20; fair agreement, 0.21–0.40; moderate agreement, 0.41–0.60; substantial agreement 0.61–0.80; almost perfect agreement, 0.81–1.00.

*See Reference 10.

This systematic review estimates that genomic classifier testing can differentiate UIP from non-UIP histopathology with a sensitivity of 68% (95% CI, 55–73%) and a specificity of 92% (95% CI, 81–95%). It also appears to increase diagnostic confidence when considered in the context of MDD according to two studies that reported diagnostic confidence increasing from 56% to 89% (agreement κ = 0.64) (3) and from 43% to 93% (agreement, κ = 0.75) (10).

A study by Kim and colleagues (11) was not selected for inclusion because it did not report data in a fashion that allowed true-positive, false-negative, true-negative, and false-positive test results to be determined. In the study, SLB samples from patients with various ILD diagnoses and confirmed pathology by an expert panel demonstrated that it is feasible to develop a genomic classifier that predicts UIP. Microarray gene expression was performed on all samples, with a subset of the samples subjected to next-generation RNAseq generating expression levels on more than 57,000 transcripts, and assessed a classifier trained on RNAseq data by using cross-validation. The RNAseq classifier had a specificity of 95% (95% CI, 84–100%) and a sensitivity of 59% (95% CI, 35–82%), which is consistent with our meta-analysis.

These results lend themselves to two perspectives. On one hand, it may be argued that the high specificity of genomic classifier testing provides important diagnostic information that can be used in MDDs (i.e., the low false-positive rate suggests that it can confidently confirm UIP) and, therefore, may reduce the need for additional, more invasive sampling by SLB or TBLC. Genomic classifier testing may be particularly useful in settings without immediate access to an ILD expert center. On the other hand, it may be argued that the use of genomic classifier testing in clinical practice is premature because 1) sensitivity is suboptimal (i.e., false-negative results are common; therefore, many cases for which the genomic classifier predicts non-UIP histopathology will still require SLB or TBLC for a confident diagnosis), 2) additional studies are necessary to obtain more precise estimates of the sensitivity and specificity, 3) existing data incompletely address the incremental diagnostic value conferred by genomic classifier testing beyond what clinical and radiologic data already provide, and 4) such testing is not yet widely available.

It is noteworthy that agreement between genomic classifier testing and the MDD diagnosis was more likely in cases in which the high-resolution computed tomography (HRCT) scan was interpreted as probable for UIP than in cases in which the HRCT scan was interpreted as indeterminate and/or the MDD determined that the likelihood of UIP was low (10). In the study by Richeldi and colleagues (9), the genomic classifier identified UIP histopathology with a sensitivity of 60.3% and a specificity 92.1% as compared with SLB or TBLC. The sensitivity increased from 34% to 79.2%, whereas specificity remained almost the same (90.6%), when genomic classifier data were combined with HRCT reports from community-based local radiologists, indicating that the integration of radiologic data markedly reduces false-negative results.

The major strength of this systematic review is that it was part of guideline development. A multidisciplinary international committee of experts ensured that the question and outcomes were relevant to practicing clinicians and used their experience to interpret the clinical significance of the estimated effects. In addition, the experts helped validate and evaluate the genomic classifier in daily clinical use.

The primary limitation of the systematic review is the limited size and number of studies that evaluated classifier clinical utility. Diagnostic test accuracy was estimated by aggregating four studies, but the diagnostic confidence and agreement reported by two studies could only be described in a narrative fashion. This highlights the need for more studies addressing the utility of genomic classifier testing in ILD. Additional limitations include the use of only English-language studies, the funding of three out of four studies by the maker of the diagnostic test, and the same individuals who helped develop the diagnostic test also participating in the studies.

In summary, genomic classifier testing predicting UIP histopathology provides supplemental information that may improve diagnostic confidence. Conversely, a result predicting non-UIP histopathology may require SLB or TBLC, given the relatively common occurrence of false-negative results. Genomic classifier results must be considered in the context of other clinical and radiologic information in an MDD to obtain a higher confidence in the diagnostic evaluation of ILD. Further investigation and clinical experience are required before the role of genomic classifier testing is established for routine clinical practice.

1. Raghu G, Remy-Jardin M, Myers JL, Richeldi L, Ryerson CJ, Lederer DJ, et al.; American Thoracic Society; European Respiratory Society; Japanese Respiratory Society; Latin American Thoracic Society. Diagnosis of idiopathic pulmonary fibrosis: an Official ATS/ERS/JRS/ALAT clinical practice guideline. Am J Respir Crit Care Med 2018;198:e44e68.
2. Silvestri GA, Vachani A, Whitney D, Elashoff M, Porta Smith K, Ferguson JS, et al.; AEGIS Study Team. A bronchial genomic classifier for the diagnostic evaluation of lung cancer. N Engl J Med 2015;373: 243251.
3. Raghu G, Flaherty KR, Lederer DJ, Lynch DA, Colby TV, Myers JL, et al. Use of a molecular classifier to identify usual interstitial pneumonia in conventional transbronchial lung biopsy samples: a prospective validation study. Lancet Respir Med 2019;7: 487496.
4. Raghu G, Remy-Jardin M, Richeldi L, Thomson CC, Inoue Y, Johkoh T, et al. Idiopathic pulmonary fibrosis (an update) and progressive pulmonary fibrosis in adults: an official ATS/ERS/JRS/ALAT clinical practice guideline. Am J Respir Crit Care Med 2022;205:e18e47.
5. Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al., editors. Cochrane handbook for systematic reviews of interventions version 6.2. Cochrane, 2021 [updated February 2021; accessed 2021 Nov 24]. Available from
6. Schünemann H, Brożek J, Guyatt G, Oxman A; GRADE Working Group. GRADE handbook: handbook for grading the quality of evidence and the strength of recommendations using the GRADE approach. Hamilton, Ontario, Canada: Grade Working Group; 2013 [updated 2013 Oct; accessed 2021 Jul 29]. Available from:
7. Schünemann HJ, Jaeschke R, Cook DJ, Bria WF, El-Solh AA, Ernst A, et al.; ATS Documents Development and Implementation Committee. An official ATS statement: grading the quality of evidence and strength of recommendations in ATS guidelines and recommendations. Am J Respir Crit Care Med 2006;174:605614.
8. Pankratz DG, Choi Y, Imtiaz U, Fedorowicz GM, Anderson JD, Colby TV, et al. Usual interstitial pneumonia can be detected in transbronchial biopsies using machine learning. Ann Am Thorac Soc 2017;14: 16461654.
9. Richeldi L, Scholand MB, Lynch DA, Colby TV, Myers JL, Groshong SD, et al. Utility of a molecular classifier as a complement to HRCT to identify usual interstitial pneumonia. Am J Respir Crit Care Med 2021;203:211220.
10. Kheir F, Alkhatib A, Berry GJ, Daroca P, Diethelm L, Rampolla R, et al. Using bronchoscopic lung cryobiopsy and a genomic classifier in the multidisciplinary diagnosis of diffuse interstitial lung diseases. Chest 2020;158:20152025.
11. Kim SY, Diggans J, Pankratz D, Huang J, Pagan M, Sindy N, et al. Classification of usual interstitial pneumonia in patients with interstitial lung disease: assessment of a machine learning approach using high-dimensional transcriptional data. Lancet Respir Med 2015;3:473482.
Correspondence and requests for reprints should be addressed to Fayez Kheir, M.D., M.Sc., Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114. E-mail: .

Supported by the American Thoracic Society, European Respiratory Society, Japanese Respiratory Society, and Asociacion Latinoamericana del Torax.

This article has a related editorial.

This article has an online supplement, which is accessible from this issue’s table of contents at

Author disclosures are available with the text of this article at

Comments Post a Comment

New User Registration

Not Yet Registered?
Benefits of Registration Include:
 •  A Unique User Profile that will allow you to manage your current subscriptions (including online access)
 •  The ability to create favorites lists down to the article level
 •  The ability to customize email alerts to receive specific notifications about the topics you care most about and special offers
Annals of the American Thoracic Society

Click to see any corrections or updates and to confirm this is the authentic version of record