American Journal of Respiratory and Critical Care Medicine

Rationale: Estimates of idiopathic pulmonary fibrosis (IPF) incidence and prevalence from electronic databases without case validation may be inaccurate.

Objectives: Develop claims algorithms to identify IPF and assess their positive predictive value (PPV) to estimate incidence and prevalence in the United States.

Methods: We developed three algorithms to identify IPF cases in the HealthCore Integrated Research Database. Sensitive and specific algorithms were developed based on literature review and consultation with clinical experts. PPVs were assessed using medical records. A third algorithm used logistic regression modeling to generate an IPF score and was validated using a separate set of medical records. We estimated incidence and prevalence of IPF using the sensitive algorithm corrected for the PPV.

Measurements and Main Results: We identified 4,598 patients using the sensitive algorithm and 2,052 patients using the specific algorithm. After medical record review, the PPVs of these algorithms using the treating clinician’s diagnosis were 44.4 and 61.7%, respectively. For the IPF score, the PPV was 76.2%. Using the clinical adjudicator’s diagnosis, the PPVs were 54 and 57.6%, respectively, and for the IPF score, the PPV was 83.3%. The incidence and period prevalences of IPF, corrected for the PPV, were 14.6 per 100,000 person-years and 58.7 per 100,000 persons, respectively.

Conclusions: Sensitive algorithms without correction for false positive errors overestimated incidence and prevalence of IPF. An IPF score offered the greatest PPV, but it requires further validation.

Scientific Knowledge on the Subject

Idiopathic pulmonary fibrosis (IPF) is a progressive and irreversible interstitial lung disease (ILD) with limited treatment options and a dismal prognosis. Large studies are needed to understand the epidemiology of this rare condition. Claims databases contain information on the largest populations, but the ability to identify IPF cases in claims data is uncertain.

What This Study Adds to the Field

Studies using a sensitive algorithm without correction for false positive misclassification overestimate the incidence and prevalence of IPF. The age-standardized, positive predictive value–corrected incidence of IPF among U.S. adults older than 50 years of age (2006–2012) was 14.6 per 100,000 person years and the prevalence was 58.7 per 100,000 persons.

Idiopathic pulmonary fibrosis (IPF), the most common of the interstitial lung diseases (ILD), is a progressive and irreversible interstitial pneumonia with limited pharmacologic therapy options and a dismal prognosis (1, 2). In IPF, the differentiation and activation of fibroblasts leads to declining lung function, pulmonary failure, and death. Two drugs, nintedanib and pirfenidone, are available for the treatment of IPF (3). Lung transplantation alters the course of the disease, and ILD is the most common reason for lung transplantation in the United States (4). Without lung transplantation, the median survival time after diagnosis is 2.5 to 4 years (1).

Different data collection approaches have been applied in measuring the incidence and prevalence of IPF, and estimates vary widely (5). Common data sources include national or disease-specific registries (68), population-based studies (9, 10), questionnaires (11, 12), and analyses of existing data collected for administrative purposes (1321). In a recent review of 34 studies from 21 countries from 1968 to 2012, Kaunisto and colleagues identified estimates of the incidence of IPF to be 3 to 9 cases per 100,000 per year in Europe and North America, with lower rates in East Asia and South America (5). Nalysnyk and colleagues identified prevalence estimates to be 14 to 63 cases per 100,000 persons (22). Incidence and prevalence appear to be increasing over time, and they are consistently higher in the men and the elderly (5, 22, 23). In the United States, Raghu and colleagues estimated the incidence of IPF among Medicare beneficiaries to be 91 cases per 100,000 person-years and prevalence to be 495 cases per 100,000 in 2011, which appeared substantially higher than previously reported (16).

The 2011 consensus guidelines (2) regarding the diagnosis of IPF require exclusion of other known causes of ILD and either the presence of a usual interstitial pneumonia pattern on high-resolution computed tomography (HRCT) in patients not subjected to lung biopsy or specific combinations of HRCT and lung biopsy patterns. The diagnosing clinician must take care in excluding potential alternative causes (2, 4), and the diagnosis should ideally be made by consensus across a multidisciplinary panel of specialists with experience in the diagnosis and management of patients with ILD (2, 24).

Although some studies have sought to identify IPF in administrative claims databases, the performance of algorithms applied in previous studies has not been assessed. Raghu and colleagues first defined IPF using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis code 516.3 (other alveolar and parietoalveolar pneumonopathy) in the absence of excluded diagnoses, and then required additional evidence of either HRCT or lung biopsy in an effort to improve specificity (15). This algorithm has since been applied in other settings (14, 16). In this study, we developed and measured the positive predictive value (PPV) of three algorithms to identify IPF, estimated the incidence and prevalence of IPF using the sensitive case definition, and corrected these estimates for misclassification introduced by the claims algorithm.

Some of the results of this study have been previously reported in the form of abstracts (25, 26).

Population and Setting

This retrospective cohort study was conducted in the HealthCore Integrated Research Database (HIRD). This research environment contains longitudinal automated health insurance claims that include enrollment data, medical care, prescription drug use, and healthcare use from U.S. health plan members in a large, nationwide, commercially insured population supplemented with clinical data identified through medical record review. Diagnoses and procedures were identified by ICD-9-CM, Current Procedural Terminology, and HealthCare Common Procedure Coding System codes, for both outpatient visits and inpatient stays. Drug claims were captured by National Drug Codes, which were then translated to broader categories using Generic Product Identifier codes. Physician specialty was also obtained from the administrative data.

After approval of a Health Insurance Portability and Accountability waiver by the Quorum Institutional Review Board, we queried administrative data to identify patients who were 50 to 100 years of age who were continuously enrolled in a health plan for at least 6 months during the study period (January 1, 2006 to September 30, 2012).

IPF Case Identification Algorithms

IPF status was classified using three algorithms. Two hierarchical binary classification algorithms were prespecified; one used a “broad” case-finding criteria to enhance sensitivity, and the other “narrow” algorithm included additional criteria to increase specificity. An adjudication panel that included three expert pulmonologists reviewed medical records and confirmed the IPF diagnoses of a random sample of 100 cases identified by the first two algorithms (50 cases were selected for each of these two case definitions; however, 66 of the 100 total cases reviewed met the specific case definition and were used to evaluate its performance). We used the initial medical record validation of the first two algorithms as test data to create the third algorithm, in which we developed an IPF score using logistic regression modeling. We subsequently measured the PPV of the IPF score algorithm in a new sample of 50 adjudicated medical records.

The broad case algorithm required (1) at least one diagnosis of IPF (ICD-9-CM code 516.3) made by a physician, and (2) no alternative diagnoses recorded after the date of the last recorded diagnosis of IPF and within 6 months of the first physician-assigned diagnosis of IPF. This approach was similar to that used by Raghu and colleagues (15); however, ICD-9-CM diagnosis code 515 (postinflammatory pulmonary fibrosis) was not considered an excluding diagnosis in our algorithm because the code is believed to capture IPF cases that have not yet completed a diagnostic workup. ICD-9-CM diagnosis code 516.31, which is more specific to IPF, was introduced in October 2011, and its definition included patients who would have formerly been included in either codes 515 or 516.3. The narrow case algorithm required patients to (1) meet the broad case algorithm, (2) have at least one IPF diagnosis from a pulmonologist, (3) have at least one diagnosis of IPF at least 4 days after HRCT of the chest or at least 4 weeks after open lung biopsy (for those patients with at least 12 months of continuous health plan eligibility before the first recorded IPF diagnosis), and (4) meet at least two criteria of the following: age 65 years or older; diagnosis of IPF persisting for at least 3 months; pulmonary function tests performed; hospitalization with IPF as the principal discharge diagnosis; IPF diagnosis recorded after antinuclear antibody (ANA) or rheumatoid factor (RF) tests were performed; or lung transplantation. Patients with less than 12 months of continuous health plan eligibility before the first recorded IPF diagnosis were not required to have HRCT or lung biopsy because these procedures may have occurred before the start of the patient’s health plan eligibility. These additional criteria were added with the intention of improving the algorithm’s PPV as would be desirable for a comparative study estimating rate ratios.

Case Adjudication

Adjudication was based on unstructured review of medical records from either electronic or paper charts redacted of personal identifying information (e.g., name, date of birth, and so on). For each patient, we requested medical records from one facility that included provider and consultation notes, admission and discharge summaries, laboratory reports, and imaging reports. To prioritize facilities, ILD specialty centers were assigned the highest priority, followed by hospitalizations with IPF as a principal discharge diagnosis, lung transplantation hospitalizations, offices or facilities with diagnoses of IPF and HRCT or lung biopsy, and offices with the highest number of IPF diagnoses recorded. Because HRCT scans were systematically unavailable, adjudication had to rely on the radiology reports captured in the records.

The clinical adjudicators were first asked to review redacted medical records to determine whether the documentation indicated that the treating clinician believed the patient had IPF, referred to as the treating clinician’s diagnosis. The adjudicators were also asked to determine whether they believed the patient had IPF, and whether the medical record contained evidence inconsistent with an IPF diagnosis. When two primary adjudicators had different impressions following medical record review, cases were discussed. Although a third clinician was available as a tie breaker in the event of a discrepancy that was not resolved through discussion, there were no cases that were not resolved through discussion by the two primary reviewers.

Statistical Analysis

A logistic regression model was used to develop the IPF score algorithm. We examined univariate associations between confirmed IPF case status and various prespecified parameters, including diagnosis of IPF by a pulmonologist, age 65 years or older, sex, presence of at least two diagnoses of IPF at least 3 months apart, diagnosis of IPF after a pulmonary function test, diagnosis of IPF after ANA or RF test, diagnosis of IPF after a hypersensitivity panel, diagnosis of IPF after HRCT or lung biopsy, at least two diagnoses of IPF by at least two different pulmonologists, at least two diagnoses of IPF by at least one pulmonologist, and hospitalization with IPF as the principal discharge diagnosis. We also considered empirically derived covariates such as diagnosis of postinflammatory pulmonary fibrosis before the first IPF diagnosis, use of statins before the first IPF diagnosis, and absence of chronic obstructive pulmonary disease (COPD) before the first diagnosis of IPF. To create the final multivariate model, we used a combination of automated stepwise selection and manual review in an iterative process to select terms to predict confirmed IPF case status, ultimately selecting terms that appeared to be strongly associated with confirmed IPF case status in both univariate and multivariate models. We examined receiver operating characteristic curves and PPVs to select a score threshold to define IPF cases (27).

PPV was calculated as the number of IPF cases confirmed by the medical record review divided by the number of cases identified by the algorithm for which medical records were reviewed and deemed evaluable, and presented with binomial exact 95% confidence intervals (CIs). A record was considered to be nonevaluable at the discretion of the adjudicators in the event that information available in the record was insufficient to make a decision about diagnosis (e.g., the medical record did not include any data during the time period of interest or focused exclusively on a medical problem unrelated to respiratory concerns). Because the broad case algorithm was designed to be sensitive, we used it to estimate incidence and prevalence, and excluded false positive cases by correcting these estimates for the algorithm’s PPV.

The prevalence of IPF was calculated as the number of patients meeting the broad algorithm divided by the source population that included patients enrolled in a qualifying health plan for at least 6 months between January 1, 2006 and September 30, 2012. Six months of enrollment were required to ensure that patients were under observation for a sufficient interval of time during which an IPF patient would be expected to seek care.

For incidence calculations, 12 months of continuous enrollment before the first diagnosis of IPF were required to identify and exclude prevalent cases. The population at risk was the subset of patients enrolled in a qualifying health plan for at least 12 months during which no diagnosis of IPF was made. Person-time at risk was calculated from the 366th day of continuous health plan eligibility until the earlier of the first diagnosis of IPF, the end of the patient’s health plan eligibility, or the end of the study period. Incidence and prevalence of IPF were adjusted for false positive cases by multiplying the incidence and prevalence of IPF obtained using the broad algorithm by its PPV. The rates were standardized to the age distribution of the 2012 U.S. population. All analyses were conducted using SAS 9.4 (SAS Institute, Cary, NC).

We identified 3,672,370 adults 50 to 100 years of age who were enrolled in a qualifying health plan for at least 183 days between January 1, 2,006 and September 30, 2012. Of these members, 6,782 had at least one physician diagnosis of IPF, 4,598 (68%) of whom met the broad case definition, 2,052 (30%) of whom met the narrow case definition, and 1,384 (20%) of whom met the IPF score case definition (Figure 1).

For the IPF score algorithm, clinically important parameters of diagnosis by a pulmonologist and diagnosis after HRCT or lung biopsy did not have an appreciable impact, because that specialty is imperfectly captured in claims data, and the ICD-9-CM coding system does not differentiate between high- and low-resolution CT. The final model yielded a c statistic of 0.906, which indicated good discrimination, and a Hosmer-Lemeshow goodness-of-fit test of 0.396. Model coefficients were used to create an IPF score as follows, where each term can be interpreted as a flag variable (1 = present, 0 = absent): 3.4870 × (IPF diagnosis after ANA or RF test) + 2.1783 × (no COPD at baseline) + 1.6326 × (postinflammatory pulmonary fibrosis at baseline) + 3.1390 × (female patient, age ≥ 75 yr) + 4.4301 × (male patients, age ≥ 65 yr) + 0.8267 × (at least two diagnoses of IPF ≥1 month apart) + 2.3790 × (hospitalization with IPF as the principal discharge diagnosis). We assessed multiple score thresholds using receiver operating characteristic curves and classified individuals as IPF patients if their scores were at least 7.5 based on the PPV associated with this cut point in the derivation data (PPV 84.2%, sensitivity 68.1%).

Patients who met the broad case definition were a mean age of 73.1 years (SD 10.9), and half were men. Sixty-three percent were incident cases, and mean follow-up after diagnosis was approximately 2 years. Comorbidities were common; at least one-fourth of patients were diagnosed with at least one of the following conditions before the first recorded diagnosis of IPF: coronary artery disease; gastroesophageal reflux; COPD, type 2 diabetes mellitus; and/or pneumonia. During the 3 months before the first IPF diagnosis, one-third of the patients visited a pulmonologist, and patients used a mean of 4.9 different medications. Cases meeting the narrow case definition were similar demographically to patients meeting the broad case definition. Cases meeting the IPF score case definition were more often men and had an older mean age (Table 1).

Table 1. Patient Characteristics

 Broad Case DefinitionNarrow Case DefinitionIPF Score AlgorithmMedical Record Confirmed
Treating Clinician DiagnosisAdjudicating Clinician Diagnosis
Total4,598 (100)2,052 (100)1,384 (100)79 (100)96 (100)
 Male2,303 (50)1,095 (53)915 (66)49 (62)57 (59)
 Female2,295 (50)957 (47)469 (34)30 (38)39 (41)
Age, mean ± SD (median)73.1 ± 10.93 (74.0)73.1 ± 9.87 (74.0)75.6 ± 9.00 (77.0)76.2 ± 8.8 (78.0)75.6 ± 8.56 (77.0)
Age group, yr     
 50–59609 (13)217 (11)74 (5)4 (5)4 (4)
 60–691,094 (24)497 (24)276 (20)13 (17)19 (20)
 70–791,459 (32)748 (37)537 (39)32 (41)39 (41)
 ≥801,436 (31)590 (29)497 (36)30 (38)34 (35)
U.S. geographic region     
 Northeast856 (19)261 (13)205 (15)16 (20)24 (25)
 South826 (18)385 (19)232 (17)14 (18)14 (15)
 Central1,906 (42)966 (47)618 (45)37 (47)44 (46)
 West921 (20)408 (20)310 (22)9 (11)12 (13)
 Unknown89 (2)32 (2)19 (1)3 (4)2 (2)
Health plan enrollment     
 At least 12 mo before the first IPF diagnosis2,879 (63)1,125 (55)932 (67)43 (54)60 (63)
 Preindex eligibility in months, mean ± SD (median)25.3 ± 21.57 (19.4)22.7 ± 21.03 (15.5)26.8 ± 21.31 (22.0)23.0 ± 22.70 (14.6)25.2 ± 22.42 (19.2)
 Postindex eligibility in months, mean ± SD (median)24.7 ± 20.79 (19.1)28.3 ± 21.61 (23.0)24.6 ± 20.06 (19.7)21.1 ± 21.69 (13.8)20.4 ± 20.6 (13.8)
 Pulmonary hypertension491 (11)223 (11)159 (12)3 (4)7 (7)
 Pulmonary embolism186 (4)77 (4)58 (4)3 (4)3 (3)
 Lung cancer136 (3)56 (3)21 (2)1 (1)1 (1)
 Coronary artery disease1,629 (35)737 (36)580 (42)30 (38)38 (40)
 Gastroesophageal reflux disease1,126 (25)508 (25)389 (28)20 (25)25 (26)
 Chronic obstructive pulmonary disease1,218 (27)522 (25)138 (10)11 (14)15 (16)
 Type 2 diabetes mellitus1,242 (27)521 (25)371 (27)27 (34)34 (35)
 Pneumonia1,456 (32)624 (30)432 (31)20 (25)26 (27)
 Lung infections136 (3)56 (3)33 (2)1 (1)3 (3)

Definition of abbreviation: IPF = idiopathic pulmonary fibrosis.

Data are shown as n (%) unless otherwise indicated.

Medical records were obtained for 150 individuals. Based on the treating clinician diagnosis, 79 cases were confirmed as IPF, 54 cases were not IPF, and 17 cases were nonevaluable. Based on the adjudicating clinician judgment, 96 cases were confirmed as IPF, 50 cases were not IPF, and 4 cases were nonevaluable.

The broad case algorithm had a PPV of 44.4% (95% CI, 29.6–60.0%) based on the treating clinician’s diagnosis. The PPV was lower for patients younger than 65 years of age and higher for men. According to the clinical adjudicators’ judgment, the PPV was 54.0% (95% CI, 39.3–68.2%). The narrow case algorithm had a PPV of 61.7% (95% CI, 48.2–73.9%) based on the treating clinician diagnosis and a PPV of 57.6% (95% CI, 44.8–69.7%) based on the adjudicating clinician diagnosis. The IPF score algorithm had a PPV of 76.2% (95% CI, 60.5–87.9%) based on the treating clinician diagnosis and a PPV of 83.3% (95% CI, 69.8–92.5%) based on the adjudicating clinicians’ diagnosis (Table 2).

Table 2. Positive Predictive Value of Algorithms for Identification of IPF

 Confirmed IPFNot IPFNonevaluableAdjudicator Agreement (%)PPV (%)95% CI
Main analysis: treating clinician's diagnosis
 Broad case definition (N = 50)202556244.429.6–60.0
 Narrow case definition (N = 66)372366161.748.2–73.9
 IPF score case definition (N = 50)321087276.260.5–87.9
Secondary analysis: expert clinician's diagnosis
 Broad case definition (N = 50)272128654.039.3–68.2
 Narrow case definition (N = 66)382628357.644.8–69.7
 IPF score case definition (N = 50)40829083.369.8–92.5

Definition of abbreviations: CI = confidence interval; IPF = idiopathic pulmonary fibrosis; PPV = positive predictive value.

Agreement between adjudicators was 61% for the narrow case definition, 62% for the broad case definition, and 72% for the IPF score case definition when adjudicators were asked to determine whether the treating clinician diagnosed the patient with IPF. For all case definitions, concordance in adjudicator reviews was higher for the adjudicating clinician’s diagnosis than for the adjudicating clinician’s perception of the treating clinician’s diagnosis (86% for the broad case definition, 83% for the narrow case definition, and 90% for the IPF score case definition) (Table 2).

The incidence of IPF using the broad case definition was 31.9 per 100,000 person-years (95% CI, 30.7–33.0). Correcting for the PPV reduced the estimate to 12.8 per 100,000 person-years. The incidence was almost twice as high in men versus women (20.2 per 100,000 person-years vs. 10.4 per 100,000 person-years), and increased dramatically with age. The HIRD population is slightly younger than the general U.S. population, and standardization to the age distribution of the U.S. population yielded a PPV-corrected incidence of 14.6 per 100,000 person-years (95% CI, 13.8–15.4) (Table 3).

Table 3. Incidence of IPF Identified by the Broad Case Algorithm by Age and Sex

 Unadjusted for PPVAdjusted for PPV
Incident IPF DiagnosesPerson-time at Risk (yr)Incidence per 100,000 Person-Years95% CIIncidence per 100,000 Person-Years95% CI
Overall, broad case algorithm2,8799,031,16531.930.7–33.112.812.0–13.5
 Standardized to the U.S. population  36.635.2–37.914.613.8–15.4
Age, yr      

Definition of abbreviations: CI = confidence interval; IPF = idiopathic pulmonary fibrosis; PPV = positive predictive value.

The prevalence estimate using the broad case definition was 125.2 per 100,000 patients (95% CI, 121.6–128.8), and correction for the PPV reduced the estimate to 50.1 per 100,000 patients (Table 4). Standardization to the age distribution of the U.S. population yielded a prevalence of 58.7 per 100,000 patients. Similar to incidence, prevalence was higher in men and older patients.

Table 4. Prevalence of IPF Identified by the Broad Case Algorithm by Age and Sex

 Unadjusted for PPVAdjusted for PPV
IPF DiagnosesPopulation at RiskPrevalence per 100,000 Patients95% CIPrevalence per 1,000 Patients95% CI
Overall, broad case algorithm4,5983,672,370125.2121.6–128.850.147.8–52.4
 Standardized to the US population  146.7142.4–150.958.756.3–61.2
Age, yr      

Definition of abbreviations: CI = confidence interval; IPF = idiopathic pulmonary fibrosis; PPV = positive predictive value.

IPF is a difficult clinical diagnosis. A complex differential diagnosis takes time to unfold and may be applied differently by different physicians at different times. Furthermore, information emerging during the progression of a patient’s clinical course may reverse an earlier diagnosis. Establishing case status from medical records is more challenging, and translation to the automated claims environment adds yet another layer of uncertainty. Therefore, it is not surprising that diagnosis codes recorded on insurance claims do not always indicate the presence of IPF, and even an algorithm that seeks a high PPV is likely to capture false positive cases. The inclusion of HRCT and/or lung biopsy criteria among others in the prespecified narrow case algorithm significantly reduced the number of potential IPF cases, but still achieved a PPV of only approximately 60%. The poor PPV of the narrow case definition suggests that use of claims-based evidence of characteristics such as HRCT and/or lung biopsy do not adequately reduce false positives generated by nonspecific ICD-9-CM codes.

A sensitive algorithm with a low PPV can be useful for estimating incidence and prevalence if the PPV is known and taken into account. In this study, adjusting incidence for the PPV reduced the incidence rate by approximately 60%. Unfortunately, estimating the PPV is not always possible. In two previous studies by Raghu and colleagues (15, 16), automated claims data were used without validation to describe the incidence and prevalence of IPF in U.S. commercially insured and Medicare patients. In both studies, the authors estimated incidence and prevalence rates that were greater than previously reported, but noted that “future studies are needed to validate the case definitions used in large database studies” (16). Because our validation results indicate that our algorithms identified many noncases as IPF patients, it seems likely that studies using similar codes without correcting for misclassification would overestimate incidence and prevalence.

In this study, the IPF score algorithm improved the PPV compared with binary classification algorithms. Distributions of demographic characteristics of cases identified using the IPF score algorithm also differed (e.g., the higher percentage of men and older patients) from those of cases that were identified using the prespecified algorithms. We believe these differences are largely due to the IPF score incorporating the strength of these risk factors in identifying fewer false positive cases and more accurately representing the characteristics of confirmed IPF cases. The IPF score can be applied in many administrative claims data environments because its parameters are widely captured through standardized billing codes. For studies seeking algorithms with high PPVs (e.g., studies estimating rate ratios), this approach may warrant exploration and validation.

The strengths of this study derive from the large size of the population and internal validation of cases using medical records. The adjudication process asked clinical experts about both the treating clinician’s and the adjudicating clinician’s diagnoses. Although we prespecified the treating physician’s diagnosis as the primary analysis, because we expected that the treating physician had access to additional information not contained in the medical record, the adjudicators experienced greater difficulty discerning the treating clinician’s diagnoses and expressed greater confidence in their own diagnoses.

With regard to limitations, the HIRD contains only commercially insured patients, which may limit generalizability. For example, patients with low socioeconomic status (e.g., insured by Medicaid) and non-white patients were underrepresented. Insofar as patterns of care and disease incidence and prevalence may differ from that seen in our population, results from this study may not apply to other populations. Black decedents are less likely and Hispanic decedents are more likely to have an IPF diagnosis at death (28), and worse IPF survival outcomes have been noted in non-white patients (29).

Medical record review was used as the gold standard to confirm case status, but could be inaccurate. Because HRCT scans and pathology specimens themselves were systematically unavailable, adjudication had to rely on reports captured in the records. Because there has been a demonstrated lack of agreement between community radiologists and academic radiologists with expertise in the diagnosis of ILD, and that ILD experts less often assign a final diagnosis of IPF than do community physicians (30), it is likely that fewer cases would have been confirmed if HRCT images had been available for expert review. If this is correct, the true PPV of the algorithms and PPV-corrected incidence and prevalence rates may be lower than reported.

Finally, it was not feasible to estimate the sensitivity of the algorithms due to the large number of records that would require review for such a rare outcome, so the extent to which sensitivity is lost cannot be formally tested.

In conclusion, administrative database studies using sensitive algorithms to identify IPF based on ICD-9-CM diagnoses without case confirmation are likely to overestimate the incidence and prevalence of IPF. Our IPF score algorithm substantially improved the PPV compared with other tested algorithms and represents a promising tool for future research. Additional validation in other settings is needed to better establish its validity and generalizability.

The authors acknowledge Pei Lin for programming; Daniel Mines, Gaurav Deshpande, and Catrin Wessman for their participation in design of the study; and Maryl Kreider for her role in algorithm development.

1. Borchers AT, Chang C, Keen CL, Gershwin ME. Idiopathic pulmonary fibrosis-an epidemiological and pathological review. Clin Rev Allergy Immunol 2011;40:117134.
2. Raghu G, Collard HR, Egan JJ, Martinez FJ, Behr J, Brown KK, Colby TV, Cordier JF, Flaherty KR, Lasky JA, et al. An Official ATS/ERS/JRS/ALAT Statement: idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis and management. Am J Respir Crit Care Med 2011;183:788824.
3. Taniguchi H, Ebina M, Kondoh Y, Ogura T, Azuma A, Suga M, Taguchi Y, Takahashi H, Nakata K, Sato A, et al.; Pirfenidone Clinical Study Group in Japan. Pirfenidone in idiopathic pulmonary fibrosis. Eur Respir J 2010;35:821829.
4. Tzilas V, Koti A, Papandrinopoulou D, Tsoukalas G. Prognostic factors in idiopathic pulmonary fibrosis. Am J Med Sci 2009;338:481485.
5. Kaunisto J, Salomaa ER, Hodgson U, Kaarteenaho R, Myllärniemi M. Idiopathic pulmonary fibrosis--a systematic review on methodology for the collection of epidemiological data. BMC Pulm Med 2013;13:53.
6. Tinelli C, De Silvestri A, Richeldi L, Oggionni T. The Italian register for diffuse infiltrative lung disorders (RIPID): a four-year report. Sarcoidosis Vasc Diffuse Lung Dis 2005;22:S4S8.
7. Thomeer M, Demedts M, Vandeurzen K; VRGT Working Group on Interstitial Lung Diseases. Registration of interstitial lung diseases by 20 centres of respiratory medicine in Flanders. Acta Clin Belg 2001;56:163172.
8. Macansch S, Glaspole I, Hopkins P, Moodley Y, Reynolds P, Walters H, Zappala C, Chapman S, Cooper W, Darbishire W, et al. The Australian Idiopathic Pulmonary Fibrosis Registry–A National Collaboration Provides Epidemiological Insights and Research Opportunities [abstract]. Am J Respir Crit Care Med 2013;187:A1457.
9. Navaratnam V, Fleming KM, West J, Smith CJP, Jenkins RG, Fogarty A, Hubbard RB. The rising incidence of idiopathic pulmonary fibrosis in the U.K. Thorax 2011;66:462467.
10. Fernández Pérez ER, Daniels CE, Schroeder DR, St Sauver J, Hartman TE, Bartholmai BJ, Yi ES, Ryu JH. Incidence, prevalence, and clinical course of idiopathic pulmonary fibrosis: a population-based study. Chest 2010;137:129137.
11. Xaubet A, Ancochea J, Morell F, Rodriguez-Arias JM, Villena V, Blanquer R, Montero C, Sueiro A, Disdier C, Vendrell M; Spanish Group on Interstitial Lung Diseases, SEPAR. Report on the incidence of interstitial lung diseases in Spain. Sarcoidosis Vasc Diffuse Lung Dis 2004;21:6470.
12. Karakatsani A, Papakosta D, Rapti A, Antoniou KM, Dimadi M, Markopoulou A, Latsi P, Polychronopoulos V, Birba G, Ch L, et al.; Hellenic Interstitial Lung Diseases Group. Epidemiology of interstitial lung diseases in Greece. Respir Med 2009;103:11221129.
13. Gribbin J, Hubbard RB, Le Jeune I, Smith CJ, West J, Tata LJ. Incidence and mortality of idiopathic pulmonary fibrosis and sarcoidosis in the UK. Thorax 2006;61:980985.
14. Lai CC, Wang C-Y, Lu H-M, Chen L, Teng N-C, Yan Y-H, Wang J-Y, Chang Y-T, Chao T-T, Lin H-I, et al. Idiopathic pulmonary fibrosis in Taiwan - a population-based study. Respir Med 2012;106:15661574.
15. Raghu G, Weycker D, Edelsberg J, Bradford WZ, Oster G. Incidence and prevalence of idiopathic pulmonary fibrosis. Am J Respir Crit Care Med 2006;174:810816.
16. Raghu G, Chen SY, Yeh WS, Maroni B, Li Q, Lee YC, Collard HR. Idiopathic pulmonary fibrosis in US Medicare beneficiaries aged 65 years and older: incidence, prevalence, and survival, 2001-11. Lancet Respir Med 2014;2:566572.
17. Hodgson U, Laitinen T, Tukiainen P. Nationwide prevalence of sporadic and familial idiopathic pulmonary fibrosis: evidence of founder effect among multiplex families in Finland. Thorax 2002;57:338342.
18. von Plessen C, Grinde O, Gulsvik A. Incidence and prevalence of cryptogenic fibrosing alveolitis in a Norwegian community. Respir Med 2003;97:428435.
19. Olson AL, Swigris JJ, Lezotte DC, Norris JM, Wilson CG, Brown KK. Mortality from pulmonary fibrosis increased in the United States from 1992 to 2003. Am J Respir Crit Care Med 2007;176:277284.
20. Ohno S, Nakaya T, Bando M, Sugiyama Y. Idiopathic pulmonary fibrosis--results from a Japanese nationwide epidemiological survey using individual clinical records. Respirology 2008;13:926928.
21. Hyldgaard C, Hilberg O, Muller A, Bendstrup E. A cohort study of interstitial lung diseases in central Denmark. Respir Med 2014;108:793799.
22. Nalysnyk L, Cid-Ruzafa J, Rotella P, Esser D. Incidence and prevalence of idiopathic pulmonary fibrosis: review of the literature. Eur Respir Rev 2012;21:355361.
23. Hutchinson JP, McKeever TM, Fogarty AW, Navaratnam V, Hubbard RB. Increasing global mortality from idiopathic pulmonary fibrosis in the twenty-first century. Ann Am Thorac Soc 2014;11:11761185.
24. Flaherty KR, King TE Jr, Raghu G, Lynch JP III, Colby TV, Travis WD, Gross BH, Kazerooni EA, Toews GB, Long Q, et al. Idiopathic interstitial pneumonia: what is the effect of a multidisciplinary approach to diagnosis? Am J Respir Crit Care Med 2004;170:904910.
25. Esposito DE, Lanes S, Deshpande G, Holick CN, Mines D, O’Quinn S, Tran T. Identification and Confirmation of Idiopathic Pulmonary Fibrosis Cases in an Electronic Insurance Claims Database. Presented at Observational Medical Outcomes Partnership-Innovation in Medical Evidence Development Symposium. November 5–6, 2013, Bethesda, MD.
26. Esposito DE, Lanes S, Donneyong M, Holick CN, Mines D, Lasky J, Lederer DL, Steven N, Tran T. Identification of Idiopathic Pulmonary Fibrosis in an Insurance Claims Database: Assessing Accuracy using Medical Records. Presented at European Respiratory Society Annual Meeting. September 6–10, 2014, Munich, Germany.
27. Cook NR. Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem 2008;54:1723.
28. Swigris JJ, Olson AL, Huie TJ, Fernandez-Perez ER, Solomon J, Sprunger D, Brown KK. Ethnic and racial differences in the presence of idiopathic pulmonary fibrosis at death. Respir Med 2012;106:588593.
29. Lederer DJ, Arcasoy SM, Barr RG, Wilt JS, Bagiella E, D’Ovidio F, Sonett JR, Kawut SM. Racial and ethnic disparities in idiopathic pulmonary fibrosis: A UNOS/OPTN database analysis. Am J Transplant 2006;6:24362442.
30. Flaherty KR, Andrei A-C, King TE Jr, Raghu G, Colby TV, Wells A, Bassily N, Brown K, du Bois R, Flint A, et al. Idiopathic interstitial pneumonia: do community and academic physicians agree on diagnosis? Am J Respir Crit Care Med 2007;175:10541060.
Correspondence and requests for reprints should be addressed to Daina B. Esposito, M.P.H., 300 Brickstone Square, 8th Floor, Suite 801A, Andover, MA 01867. E-mail:

Supported by AstraZeneca.

Author Contributions: D.B.E., S.L., M.D., and C.N.H. made substantial contributions to the conception, design, and execution of the study, and interpretation of the findings; drafted the manuscript and revised it critically for important intellectual content; gave final approval of the version to be published; and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are addressed. T.N.T. and S.O'Q. made substantial contributions to the conception and design of the study as well as interpretation of the findings, revised the manuscript critically for important intellectual content, and gave final approval of the version to be published. J.A.L., D.L., S.D.N., and J.P. made substantial contributions to the design or execution of the study, reviewed the manuscript critically for important intellectual content, and gave final approval of the version to be published.

Originally Published in Press as DOI: 10.1164/rccm.201504-0818OC on August 4, 2015

Author disclosures are available with the text of this article at

Comments Post a Comment

New User Registration

Not Yet Registered?
Benefits of Registration Include:
 •  A Unique User Profile that will allow you to manage your current subscriptions (including online access)
 •  The ability to create favorites lists down to the article level
 •  The ability to customize email alerts to receive specific notifications about the topics you care most about and special offers
American Journal of Respiratory and Critical Care Medicine

Click to see any corrections or updates and to confirm this is the authentic version of record