Historically, histopathologic evaluation has been viewed as the gold standard for diagnosis in diffuse lung disease. That perception is changing. An ATS/ERS committee that set out to reclassify the idiopathic interstitial pneumonias has recommended that final diagnoses be reached after interactions between clinicians, radiologists, and histopathologists (1). In this issue of the Journal (pp. 904–910), Flaherty and colleagues have examined the formulation of diagnoses in suspected idiopathic pulmonary fibrosis (IPF) against end-points that are key surrogates for diagnostic accuracy (interobserver variation, diagnostic confidence) (2). Improvements in both end-points were seen with the successive diagnostic integration of radiologic, clinical, and histologic information. This study provides several pragmatic insights, but also exposes the reader to a methodologic minefield.
In diffuse lung disease, the distinction between IPF and other entities is pivotal, both because IPF is relatively common and because it is associated with a substantially worse outcome. The current study highlights one context in which a single practitioner's diagnosis is largely accurate. A diagnosis of IPF from clinical and HRCT information, made by either a clinician or radiologist, was associated with a histopathologic diagnosis of usual interstitial pneumonia (UIP) in all but two cases. In total, 5 of 104 pooled diagnoses of IPF were discordant with the histologic diagnosis, with the positive predictive value of a diagnosis of IPF thus exceeding 95% (calculated from stated data). These findings mirror those previously reported in diagnostic studies of diffuse lung disease (3, 4): the positive predictive value of either a clinical or an HRCT diagnosis of IPF approximates 90%. The novel feature of the current study is the evaluation of a multidisciplinary approach, integrating clinical and radiological information with histologic data. It now appears increasingly difficult to justify a surgical biopsy when the clinical and imaging features are typical of IPF. The findings in the current study help to validate a growing consensus that a non-histologic diagnosis of IPF, based upon typical clinical and HRCT features (5), should now be accepted in clinical practice and in a high proportion of IPF cases in clinical series.
By contrast, the addition of histologic information changed clinicians' and radiologists' diagnoses of disorders other than IPF very frequently. With the exclusion of cases diagnosed as IPF, clinicians and radiologists changed their diagnoses in 56 of 103 instances (54%), and in 60 of 85 instances (71%), respectively (calculated from stated data) (2). Although these findings underline the importance attached to histologic data by the practitioners participating in the study, the reported comparisons between clinicians, radiologists, and histopathologists should be applied to routine practice with caution. Surgical biopsy is increasingly reserved for cases in which clinical or HRCT information is inconclusive or discordant, and this must inevitably have applied to the present study, at least to some extent. Thus, the high diagnostic variation between the HRCT radiologists and the frequency with which radiologists changed their diagnoses should come as no surprise in this particular population of patients. Observer variation in HRCT diagnoses is strikingly higher in cases undergoing surgical biopsy than in cases diagnosed noninvasively (6). Moreover, the specialists taking part in the study (2) trained and practiced in an era in which surgical biopsy was regarded as the diagnostic gold standard. In patients in whom noninvasive evaluation is inconclusive, biopsy findings must necessarily be given the greatest diagnostic weighting.
The report challenges a traditional view that a reference standard (a “gold standard”) is an absolute requirement in any evaluation of diagnosis. The recent view that a final diagnosis should be made by consensus between histopathologist, radiologist, and clinician is a radical departure from the diagnostic thinking of the late twentieth century. The essential assumption underlying the present study is that there is no gold standard for diagnosis in diffuse lung disease, merely the silver standards of clinical, radiologic, and histopathologic evaluation, although histopathologic assessment emerged as the most argentiferous of the silver standards. It is interesting to reflect on the changes in perception that have led to the debasement of a gold standard. Three issues, in particular, have complicated diagnostic thinking.
The problem of “sampling error”—divergent histopathologic diagnoses in two or more biopsy sites—was recently documented by Flaherty and coworkers (7), and subsequently by Monaghan and colleagues (8). Specifically, if the “wrong area” is sampled, a histopathologic diagnosis of nonspecific interstitial pneumonia (NSIP) can sometimes be misleading because the disease course may reflect the presence of UIP in nonsampled areas. It is likely, although not proven, that sampling error will be minimized by using HRCT to select multiple biopsy sites representative of the full range of morphologic appearances.
A second crucial consideration is interobserver variation between histopathologists. In a recent study of 133 biopsies, very significant observer variation between 10 experienced specialist thoracic histopathologists was quantified, with observer agreement (kappa coefficient of agreement approximately 0.40) barely clinically acceptable (9). In the present study, observer agreement was considerably higher but the population contained a high proportion of patients with UIP, in which diagnostic concordance may be more likely. It is easy to be overcritical of observer disagreement between histopathologists: in reality, histopathologic appearances may be intermediate between two entities in a significant proportion of cases, and observer variation may be an appropriate and accurate reflection of this fact. It is especially in this scenario that clinical and imaging data are likely to be key determinants of a final consensus diagnosis.
If histopathologic evaluation is no longer considered to be the diagnostic gold standard, how is the accuracy of diagnosis to be validated in future clinical studies and, equally, in clinical practice? In this regard, it is important not to lose sight of the greatest benefit of diagnostic precision. A surgical biopsy first became diagnostically pivotal in diffuse lung disease because histopathologic assessment provided the most accurate prediction of the natural history and treated course of disease, an observation that stood the test of time. Many gold standard diagnostic tests owe their status as much to antiquity as to accuracy (10). HRCT evaluation may not have the same diagnostic pedigree as surgical biopsy, but the interpretative skills of histopathologists and radiologists, alike, amount to pattern recognition with regard to abnormal morphologic findings. In the combined patient subgroup with a histologic diagnosis of UIP or NSIP, the prediction of outcome is refined when HRCT and histopathologic data are integrated (11).
Thus, it is likely that the problem of diagnostic validation will be resolved by the evaluation of diagnosis against subsequent disease behavior, and this applies equally to the multidisciplinary diagnostic approach. The diagnostic problem of suspected IPF, as in the current study, is an excellent model because major differences in outcome do exist between IPF and NSIP. In diffuse lung diseases other than IPF, differences in outcome between disease entities are generally less clear-cut. Future studies of multidisciplinary diagnosis now need to focus upon disease outcomes in suspected IPF.
1. | American Thoracic Society, European Respiratory Society. American Thoracic Society/European Respiratory Society international multidisciplinary consensus classification of the Idiopathic Interstitial Pneumonias. Am J Respir Crit Care Med 2002;165:277–304. |
2. | Flaherty KR, King TE Jr, Raghu G, Lynch JP III, Colby TV, Travis WD, Gross BH, Kazerooni EA, Toews GB, Long Q, et al. Idiopathic interstitial pneumonia: what is the effect of a multi-disciplinary approach to diagnosis? Am J Respir Crit Care Med 2004;170:904–910. |
3. | Raghu G, Mageto YN, Lockhart D, Schmidt RA, Wood DE, Godwin JD. The accuracy of the clinical diagnosis of new-onset idiopathic pulmonary fibrosis and other interstitial lung diseases. Chest 1999;116:1168–1174. |
4. | Hunninghake GW, Zimmerman MB, Schwartz DA, King TE Jr, Lynch J, Hegele R, Waldron J, Colby T, Muller N, Lynch D, et al. Utility of a lung biopsy for the diagnosis of idiopathic pulmonary fibrosis. Am J Respir Crit Care Med 2001;164:193–196. |
5. | American Thoracic Society. Idiopathic pulmonary fibrosis: diagnosis and treatment. International consensus statement. Am J Respir Crit Care Med 2000;161:646–664. |
6. | Aziz ZA, Wells AU, Hansell DM, Bain GA, Copley SJ, Desai SR, Ellis SM, Gleeson FV, Grubnic S, Nicholson AG, et al. HRCT diagnosis of diffuse parenchymal lung disease: inter-observer variation. Thorax 2004;59:506–511. |
7. | Flaherty KR, Travis WD, Colby TV, Toews GB, Kazerooni EA, Gross BH, Jain A, Strawderman RL III, Flint A, Lynch JP III, et al. Histologic variability in usual and nonspecific interstitial pneumonias. Am J Respir Crit Care Med 2001;164:1722–1727. |
8. | Monaghan H, Wells AU, Colby TV, du Bois RM, Hansell DM, Nicholson AG. Prognostic implications of histologic patterns in multiple surgical lung biopsies from patients with idiopathic interstitial pneumonia. Chest 2004;125:522–526. |
9. | Nicholson AG, Addis BJ, Bharucha H, Clelland CA, Corrin B, Gibbs AR, Hasleton PS, Kerr KM, Ibrahim NB, Stewart S, et al. Inter-observer variation between pathologists in diffuse parenchymal lung disease. Thorax 2004;59:500–505. |
10. | Hansell DM, Wells AU. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative [comment]. Clin Radiol 2003;58:573–574. |
11. | Flaherty KR, Thwaite EL, Kazerooni EA, Gross BH, Toews GB, Colby TV, Travis WD, Mumford JA, Murray S, Flint A, et al. Radiological versus histological diagnosis in UIP and NSIP: survival implications. Thorax 2003;58:143–148. |