Rationale: Normal values for FEV1 and FVC are currently calculated using cross-sectional reference equations that include terms for race/ethnicity, an approach that may reinforce disparities and is of unclear clinical benefit.
Objectives: To determine whether race/ethnicity–based spirometry reference equations improve the prediction of incident chronic lower respiratory disease (CLRD) events and mortality compared with race/ethnicity–neutral equations.
Methods: The MESA Lung Study, a population-based, prospective cohort study of White, Black, Hispanic, and Asian adults, performed standardized spirometry from 2004 to 2006. Predicted values for spirometry were calculated using race/ethnicity–based equations following guidelines and, alternatively, race/ethnicity–neutral equations without terms for race/ethnicity. Participants were followed for events through 2019.
Measurements and Main Results: The mean age of 3,344 participants was 65 years, and self-reported race/ethnicity was 36% White, 25% Black, 23% Hispanic, and 17% Asian. There were 181 incident CLRD-related events and 547 deaths over a median of 11.6 years. There was no evidence that percentage predicted FEV1 or FVC calculated using race/ethnicity–based equations improved the prediction of CLRD-related events compared with those calculated using race/ethnicity–neutral equations (difference in C statistics for FEV1, −0.005; 95% confidence interval [CI], −0.013 to 0.003; difference in C statistic for FVC, −0.008; 95% CI, −0.016 to −0.0006). Findings were similar for mortality (difference in C statistics for FEV1, −0.002; 95% CI, −0.008 to 0.003; difference in C statistics for FVC, −0.004; 95% CI, −0.009 to 0.001).
Conclusions: There was no evidence that race/ethnicity–based spirometry reference equations improved the prediction of clinical events compared with race/ethnicity–neutral equations. The inclusion of race/ethnicity in spirometry reference equations should be reconsidered.
Several recent studies have highlighted that the use of race/ethnicity in clinical algorithms can create harmful disparities in health. Current approaches for the interpretation of spirometry for clinical decision-making use reference equations that account for (or correct for) race/ethnicity, mostly on the basis of cross-sectional studies. Whether the use of race/ethnicity in currently recommended spirometry reference equations provides a benefit of improvement in prediction of incident chronic lower respiratory disease events is unknown.
In a large, multiethnic, population-based, prospective cohort study, currently recommended spirometry reference equations that include race/ethnicity did not significantly improve discriminative accuracy for the prediction of incident chronic lower respiratory disease events or all-cause mortality compared with available reference equations that do not include race/ethnicity. These findings suggest that the inclusion of race/ethnicity in definitions of normal spirometry should be reconsidered, potentially using longitudinal, risk-based approaches.
In 1846, Hutchinson reported the population variation of VC in a cross-sectional study of 1,923 “healthy cases” to distinguish normal from abnormal results for his new invention, the spirometer (1). In 1869, a large survey of the American military used the same design to report that, among personnel “in usual vigor,” VC was lower in Black compared with White adults (2). Some authors in that era used such descriptions of cross-sectional differences in lung capacity by race to justify slavery (3).
Contemporary spirometry reference equations that are used to define abnormalities in lung function also use a cross-sectional design limited to a healthy sample (never smokers without pulmonary symptoms or disease) (4–6) and also describe lower FEV1 and FVC among Black compared with White adults (4). These equations include a negative coefficient for Black race such that two individuals with identical age, sex, and standing height and the same low value of FEV1 (or FVC) can be classified on the basis of the calculated predicted values as abnormal if the individual is White and normal if the individual is Black. This classification can have important implications for the prescription of medications (7), for further workup for lung disease (8, 9), and for the recognition of occupational lung disease (10).
Although mean differences in sitting or standing height might explain some of the observed cross-sectional mean difference in lung volumes between Black and White nonsmoking adults (11, 12), lifetime, socially determined environmental exposures also contribute to adult lung function and potentially to racial/ethnic differences in lung volumes in nonsmoking adults. Exposures such as low birth weight, lower respiratory tract infections, secondhand smoke, and air pollution are often higher among minority groups because of economic and racial inequalities and are associated with slower lung growth, lower adult lung function, and increased risk for clinical lung disease (13–18). Hence, the lower mean values of lung function observed and defined as normal by current race/ethnicity–based reference equations among Black compared with White adults may be acquired rather than physiologic and might lead to misclassification of risk among Black and other minority patients.
Whereas predicted values relevant to lung diseases are derived from cross-sectional studies, the thresholds for diagnosis and treatment for many other diseases, including type 2 diabetes and hyperlipidemia, are defined on the basis of risk for incident clinical events (19, 20) in prospective studies because a large component of contemporary medical practice involves the prevention of future clinical events. Given that percentage predicted values of FEV1 and FVC are similarly used to identify patients at risk for chronic obstructive pulmonary disease (COPD) hospitalization (21) and mortality (22), we applied a risk-based approach to test whether guideline-recommended, race/ethnicity–based reference equations used to calculate percentage predicted FEV1 and FVC improved the prediction of chronic lower respiratory disease (CLRD) events and all-cause mortality compared with race/ethnicity–neutral equations in a multiethnic, prospective cohort study.
Some of the results of this study have been previously reported in the form of an abstract (23).
MESA is a prospective, population-based cohort study that enrolled 6,814 White, Black, Hispanic, and Asian participants ages 45–84 years without clinical cardiovascular disease or weight > 300 lb from 2000 to 2002. Participants were recruited from six U.S. communities with a minimum of two racial/ethnic groups recruited at each site to limit site-by-race/ethnicity confounding, and all methods were standardized across the six sites (24). The MESA Lung Study recruited a random sample of 3,965 MESA participants at all six sites from 2004 to 2006 who consented to genetic analyses and completed endothelial function measures, with oversampling of Asian participants (25). For this report on risk related to spirometry, we included participants with valid lung function measures and excluded those with physician diagnosis of CLRD or CLRD hospitalizations prior to spirometry (see Figure E1 in the online supplement). CLRD includes COPD, chronic bronchitis, emphysema, and asthma (26).
The institutional review boards of all collaborating institutions approved this study, and all participants provided informed consent.
Race was self-reported according to 2000 U.S. census criteria as one of African American/Black (hereafter Black), Asian, or Caucasian/White (hereafter White) race; ethnicity was self-reported as Spanish/Hispanic/Latino (hereafter Hispanic). Participants who reported Hispanic ethnicity were coded as Hispanic regardless of self-reported race.
Prebronchodilator lung function was measured in accordance with the American Thoracic Society/European Respiratory Society–recommended guidelines from 2004 to 2006, with all participants attempting at least three acceptable maneuvers (27). All spirometry tests were reviewed at the Spirometry Reading Center by one author (J.L.H.), and each test was graded for quality on a five-point scale; valid spirometry was defined by at least two acceptable curves with both FVC and FEV1 values repeatable within 200 ml (28). Airflow limitation was defined as FEV1/FVC < 0.70; moderate to severe airflow limitation was defined as FEV1/FVC < 0.70 and percentage predicted FEV1 < 80%.
Race/ethnicity–based predicted values of the FEV1 and FVC were calculated using the Global Lung Function Initiative (GLI) reference equations for White, Black, Hispanic, and Asian groups following current American Thoracic Society/European Respiratory Society guidelines (4, 5). GLI equations have the following terms for race/ethnicity, stratified by sex, and take the general form (4)
The North East Asian category was used because most Asian participants in MESA originated from that region; GLI equations for White participants were used for Hispanic participants, as recommended by the GLI and prior work (29).
Race/ethnicity–neutral predicted values were calculated using the GLI equations for “Other,” which average the above groups (4), do not include terms for race/ethnicity, and take the general form
Sensitivity analyses were performed using U.S. National Health and Nutrition Examination Survey (NHANES) III reference equations, which have separate equations for White, Black, and Hispanic groups (30). NHANES III did not derive reference equations for Asian individuals, so equations for White individuals with a correction factor of 0.88 were used (28). NHANES III equations for White and, separately, Black groups were applied to all racial/ethnic groups as comparators.
CLRD-related events and all-cause mortality were examined because they are relatively specific for pulmonary diseases and precisely measured, respectively. They were ascertained via regular follow-up calls through 2019, medical records, and National Death Index (31) searches. CLRD-related events were defined as hospitalizations or deaths for which CLRD was classified as a primary, an underlying, or a contributing cause following a previously validated protocol using International Classification of Diseases, Ninth Revision, and International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, diagnosis codes for CLRD (asthma [493 and J45 and J46, respectively], COPD [496 and J44], chronic bronchitis [490 and 491 and J40–J42], and emphysema [492 and J43]) (32–34). In previous work, 82% of events meeting this definition were confirmed by two-physician review of medical records as CLRD (33, 34). Dates of hospitalizations and deaths were obtained from medical records, death certificates, and National Death Index reports.
Follow-up for CLRD-related events and mortality was 82% and 92% at 10 years, respectively.
Sex and educational attainment were self-reported. Ever smokers were defined as participants reporting at least 100 lifetime cigarettes and current smokers as those smoking a cigarette in the past 30 days or positive cotinine levels (25). Pack-years were calculated as (cigarettes per day × years smoked)/20. Height, weight, and blood pressure measurements were performed using standardized methods (24). Total cholesterol, high-density lipoprotein, and low-density lipoprotein were measured from fasting blood samples. History of hypertension and diabetes were self-reported; use of medications was ascertained by medication inventory. Electronic cigarette use was assessed 10 years after baseline and was very rare in this older cohort (<1%).
The percentage of the predicted value of lung function was calculated as the observed value divided by the predicted value times 100, the latter calculated using race/ethnicity–based as well as race/ethnicity–neutral equations. Associations between percentage predicted FEV1 and FVC and the incidence rate of CLRD-related events were tested using proportional hazards regression. The same approach was used for all-cause mortality and dichotomized analyses. Time-to-event data were defined as time since spirometry for each individual. The Harrell C statistic was used to determine the discriminative accuracy of each model (35, 36). Formal statistical comparisons of C statistics were performed using a nonparametric approach to compare the predictive performance of each model using a custom-written SAS macro that has been used in other studies (21) on the basis of a prior approach (37). Primarily, unadjusted analyses were performed using only percentage predicted FEV1 (or FVC) as the predictors in the models. Analyses were also performed by percentage predicted lung function and the clinically relevant threshold of 80%. Secondary analyses were stratified by race/ethnicity and also adjusted for the following covariates that predict either CLRD events or all-cause mortality in MESA: body mass index, educational attainment, smoking status, pack-years, blood pressure, high-density lipoprotein, low-density lipoprotein, total cholesterol, and history of hypertension and diabetes. Effect modification by sex of percentage predicted values and incident events was tested using interaction terms between sex and percentage predicted values. Sensitivity analyses were performed including participants with prebaseline CLRD, restricted to healthy nonsmokers as previously defined (28) in this sample, and for NHANES III reference equations.
In addition, we explored the relationships of percentage predicted lung function and the 10-year cumulative incidence of events in descriptive analyses using generalized additive models with splines to generate plots and 95% confidence intervals. Evidence for nonlinearity was tested in nested models with and without splines.
Analyses were performed using SAS version 9.4 (SAS Institute), and R version 1.3 (R Foundation for Statistical Computing). A two-sided P value < 0.05 was considered to indicate statistical significance.
The mean age of the 3,344 participants was 65.3 ± 9.6 years, 50% were female, and the racial/ethnic distribution was 36% White, 25% Black, 23% Hispanic, and 17% Asian. The baseline characteristics of the study sample stratified by race/ethnicity are shown in Table 1. Age and sex were similar across racial/ethnic groups, mean height was lower among Hispanic and Asian participants, mean body mass index was higher among Black and Hispanic participants and lower among Asian compared with White participants, educational attainment varied, and smoking history was greater among White and Black participants compared with other racial/ethnic groups.
White | Black | Hispanic | Asian | |
---|---|---|---|---|
n (%) | 1,187 (35.5) | 844 (25.2) | 755 (22.6) | 558 (16.7) |
Age, yr, mean (SD) | 65.9 (9.8) | 65.4 (9.4) | 64.4 (9.7) | 65.0 (8.7) |
Female, n (%) | 599 (50.5) | 423 (50.1) | 385 (51.0) | 271 (48.6) |
Height, cm, mean (SD) | 168.7 (9.7) | 168.7 (9.7) | 162.2 (9.3) | 161.4 (8.7) |
BMI, kg/m2, mean (SD) | 27.8 (5.1) | 29.7 (5.5) | 29.3 (5.1) | 23.9 (3.3) |
Educational attainment, n (%) | ||||
Less than high school | 30 (2.5) | 54 (6.4) | 113 (15.0) | 52 (9.3) |
High school | 197 (16.6) | 174 (20.6) | 168 (22.3) | 86 (15.4) |
Some college | 297 (25.0) | 308 (36.5) | 197 (26.1) | 127 (22.8) |
College or more | 663 (55.9) | 307 (36.4) | 277 (36.7) | 293 (52.5) |
Smoking status, n (%) | ||||
Never | 480 (40.4) | 339 (40.2) | 352 (46.6) | 396 (71.0) |
Former | 606 (51.1) | 386 (45.7) | 330 (43.7) | 136 (24.4) |
Current | 96 (8.1) | 111 (13.2) | 69 (9.1) | 25 (4.5) |
Missing | 5 (0.4) | 8 (0.9) | 4 (0.5) | 1 (0.2) |
Smoking pack-years, median (IQR) | 15.0 (3.3–33.0) | 12.5 (3.5–26.3) | 6.5 (1.1–18.5) | 9.0 (0.5–23.8) |
Lung function, mean (SD) | ||||
FEV1, L | 2.6 (0.8) | 2.3 (0.7) | 2.5 (0.7) | 2.2 (0.6) |
FVC, L | 3.6 (1.0) | 3.0 (0.8) | 3.2 (0.9) | 2.9 (0.8) |
FEV1/FVC | 0.74 (0.08) | 0.76 (0.09) | 0.77 (0.07) | 0.76 (0.07) |
Percentage predicted lung function using race/ethnicity–based equations, mean (SD)* | ||||
FEV1 | 93.6 (15.5) | 93.7 (17.9) | 95.2 (15.5) | 88.7 (13.8) |
FVC | 97.9 (14.9) | 96.5 (17.3) | 96.5 (15.0) | 92.1 (13.8) |
Percentage predicted lung function using race/ethnicity–neutral equations, mean (SD)† | ||||
FEV1 | 100.4 (16.7) | 86.2 (16.5) | 102.2 (16.6) | 92.8 (14.4) |
FVC | 106.4 (16.2) | 89.2 (16.0) | 104.8 (16.3) | 96.8 (14.4) |
Percentage predicted FEV1 < 80% | ||||
Race/ethnicity–based equations, n (%)* | 204 (17) | 153 (18) | 118 (16) | 143 (26) |
Race/ethnicity–neutral equations, n (%)† | 132 (11) | 278 (33) | 61 (8) | 100 (18) |
Percentage predicted FVC ⩽ 80% | ||||
Race/ethnicity–based equations, n (%)* | 139 (12) | 122 (14) | 83 (11) | 106 (19) |
Race/ethnicity–neutral equations, n (%)† | 67 (6) | 224 (27) | 45 (6) | 71 (13) |
Moderate to severe airflow limitation‡ | ||||
Race/ethnicity–based equations, n (%)* | 116 (10) | 66 (8) | 42 (6) | 46 (8) |
Race/ethnicity–neutral equations, n (%)† | 80 (7) | 100 (12) | 29 (4) | 37 (7) |
Chronic lower respiratory disease–related events§ | ||||
n (%) | 66 (5.6) | 57 (6.8) | 36 (4.8) | 22 (3.9) |
Person-years to last follow-up | 12,397 | 8,636 | 7,884 | 6,070 |
Event rate, per 10,000 person-years | 53 | 66 | 46 | 36 |
All-cause mortality | ||||
n (%) | 203 (17.1) | 156 (18.5) | 111 (14.7) | 77 (13.8) |
Person-years to last follow-up | 12,665 | 8,852 | 7,987 | 6,151 |
Event rate, per 10,000 person-years | 160 | 176 | 139 | 125 |
The mean FEV1 and FVC were lower among minority compared with White participants (Table 1). Distributions of percentage predicted FEV1 calculated using race/ethnicity–based equations were approximately similar for White, Black, and Hispanic participants and lower for Asian participants (Table 1 and Figure 1A). Distributions of percentage predicted FEV1 calculated using the race/ethnicity–neutral equations were higher for White and Hispanic participants relative to Black and Asian participants. Moderate to severe airflow limitation was most frequent among White participants when defined by race/ethnicity–based equations and most frequent among Black participants when defined by race/ethnicity–neutral equations (Table 1). Findings were similar for percentage predicted FVC (Table 1 and Figure 1B).
Over a median follow-up period of 11.6 years, there were 181 incident CLRD-related events (52 per 10,000 person-years) and 547 deaths (153 per 10,000 person-years). Among the four racial/ethnic groups, the incidence rates for both CLRD-related events and mortality were highest among Black participants (Table 1).
There was no consistent evidence that percentage predicted FEV1 and FVC calculated using race/ethnicity–based equations improved the prediction of CLRD-related events compared with those calculated using race/ethnicity–neutral equations (Table 2). Findings were similar for all-cause mortality. The 95% confidence intervals for the differences in the C statistics between the two equations were narrow. Across four comparisons, there was one statistically significant result (P = 0.03), which favored the race/ethnicity–neutral equations. Multivariable-adjusted models yielded no statistically significant differences (Table E1).
Events/Person-Years of Follow-Up | Incidence Rate per 10,000 Person-Years | Percentage Predicted Lung Function | Harrell C Statistic (95% CI) | P Value | |||
---|---|---|---|---|---|---|---|
Race/Ethnicity– based Equations* | Race/Ethnicity– Neutral Equations† | Difference | |||||
Chronic lower respiratory disease–related events‡ | 181/34,987 | 52 | FEV1 | 0.71 (0.67 to 0.75) | 0.72 (0.67 to 0.76) | −0.005 (−0.013 to 0.003) | 0.22 |
FVC | 0.61 (0.56 to 0.65) | 0.62 (0.57 to 0.66) | −0.008 (−0.016 to −0.0006) | 0.03 | |||
All-cause mortality | 547/35,655 | 153 | FEV1 | 0.57 (0.54 to 0.59) | 0.57 (0.54 to 0.59) | −0.002 (−0.008 to 0.003) | 0.44 |
FVC | 0.53 (0.51 to 0.56) | 0.54 (0.51 to 0.56) | −0.004 (−0.009 to 0.001) | 0.14 |
Stratification of the study sample by race/ethnicity did not yield any consistent evidence that race/ethnicity–based equations improved the prediction of CLRD-related events in any racial/ethnic group, again with lower bounds of 95% confidence intervals for the differences in the C statistics at the third decimal place or smaller for all comparisons except among Asians (Table E2). Stratified analyses for all-cause mortality showed similar results except for statistically significantly higher discriminative accuracy with the race/ethnicity–neutral equations for the FVC among White participants and the FEV1 and the FVC among Black participants (Table E3).
Figure 2A shows the relationship for each racial/ethnic group of percentage predicted FEV1 and 10-year cumulative incidence of CLRD-related events along with 95% confidence intervals. There was evidence for a threshold effect overall (P for nonlinearity = 0.01) such that the observed risk for incident CLRD-related events was much greater for participants with FEV1 below than above approximately 80% predicted.
The observed 10-year risk among White participants was somewhat higher for a given percentage predicted FEV1 using race/ethnicity–neutral compared with race/ethnicity–based equations below approximately 80% predicted, although 95% confidence intervals overlapped substantially (Figure 2A, blue and red lines, respectively). The observed 10-year risk among Black participants showed somewhat higher risk with race/ethnicity–based equations and, again, overlapping 95% confidence intervals (Figure 2A, red line). Comparison of these two panels suggested higher observed risk for Black than White participants in the approximately 70–100% predicted range, in which 54% of events occurred, whereas the opposite was true for percentage predicted FEV1 below 70% predicted. For example, a patient with an FEV1 of 80% predicted had an observed 10-year risk for a CLRD-related event of 6% if he or she was White and 8% if he or she was Black; a patient with an FEV1 of 50% predicted had an observed 10-year risk for 56% if he or she was White and 27% if he or she was Black. No similar underestimation was evident and 95% confidence intervals were also largely superimposed for Hispanic participants. Few events in Asian participants made those estimates unstable. Findings were approximately similar for the percentage predicted FVC and CLRD events (Figure 2B) and with multivariable adjustment (Figure E2).
Figures 2C and 2D show the relationship between percentage predicted FEV1 and FVC and the 10-year cumulative incidence of all-cause mortality, respectively. In both cases, regression lines were close to each other, and 95% confidence intervals were also largely superimposed.
For dichotomized analyses at the clinical threshold of 80% predicted FEV1 and FVC and moderate to severe airflow limitation, hazard ratios were of larger magnitude for the prediction of CLRD-related events when predicted values were defined by race/ethnicity–neutral equations compared with race/ethnicity–based equations (Table 3). Results were generally similar for all-cause mortality (Table E4). Stratification by race/ethnicity showed similar results among White, Hispanic, and Asian participants but opposing results for Black participants (Tables E5–E7). Results were similar for all-cause mortality (Tables E8–E10).
Equations | Threshold | At Risk | Chronic Lower Respiratory Disease–related Events* | Person-Years of Follow-Up | Rate per 10,000 Person-Years | Hazard Ratio (95% CI) |
---|---|---|---|---|---|---|
Full sample (N = 3,344) | ||||||
Race/ethnicity based† | FEV1 < 80% | 619 | 80 | 6,077 | 132 | 3.83 (2.86–5.14) |
FEV1 ⩾ 80% | 2,725 | 101 | 28,910 | 35 | ||
Race/ethnicity neutral‡ | FEV1 < 80% | 571 | 84 | 5,544 | 152 | 4.71 (3.51–6.30) |
FEV1 ⩾ 80% | 2,773 | 97 | 29,443 | 33 | ||
Full sample (N = 3,344) | ||||||
Race/ethnicity based† | FVC ⩽ 80% | 450 | 48 | 4,482 | 107 | 2.48 (1.79–3.45) |
FVC > 80% | 2,894 | 133 | 30,505 | 44 | ||
Race/ethnicity neutral‡ | FVC ⩽ 80% | 407 | 40 | 4,019 | 100 | 2.21 (1.56–3.14) |
FVC > 80% | 2,937 | 141 | 30,967 | 46 | ||
Airflow limitation (n = 670)§ | ||||||
Race/ethnicity based† | FEV1 < 80% | 270 | 58 | 2,435 | 238 | 3.67 (2.31–5.83) |
FEV1 ⩾ 80% | 400 | 26 | 3,985 | 65 | ||
Race/ethnicity neutral‡ | FEV1 < 80% | 246 | 59 | 2,183 | 270 | 4.61 (2.89–7.37) |
FEV1 ⩾ 80% | 424 | 25 | 4,236 | 59 |
There was no evidence of effect modification by sex between percentage predicted values and CLRD-related events and all-cause mortality with race/ethnicity–based equations or race/ethnicity–neutral equations (Table E11).
Sensitivity analyses including participants with prebaseline CLRD (28) and restricted to healthy nonsmokers showed no evidence that race/ethnicity–based equations improved the prediction of CLRD-related events or all-cause mortality compared with race/ethnicity–neutral equations (Tables E12 and E13).
Calculation of predicted values using NHANES III reference equations rather than GLI equations also showed no evidence that race/ethnicity–based equations improved the prediction of CLRD-related events or all-cause mortality compared with race/ethnicity–neutral equations in the whole sample or within strata of race/ethnicity (Tables E14–E16).
In this prospective cohort study with standardized measures across four racial/ethnic groups, there was no evidence that recommended, spirometry reference equations that include terms for race/ethnicity yielded percentage predicted values that improved the prediction of CLRD-related events or mortality compared with race/ethnicity–neutral equations. These longitudinal findings suggest that the inclusion of race/ethnicity in spirometry reference equations warrants reexamination.
Normal variation in lung function has been defined on the basis of cross-sectional studies in “healthy” samples since 1846 (1, 4, 38). Yet the main purpose of calculating percentage predicted FEV1 and FVC in the modern era is to inform clinical decision-making regarding treatments aimed, in part, at reducing clinical events in chronic lung diseases (7, 9, 39, 40) and regarding diagnosis of restrictive lung disease (8, 41). In this context, an approach to define the normal distribution of percentage predicted FEV1 and FVC on the basis of risk for incident disease in prospective cohort studies, similar to the approach used for hyperlipidemia and diabetes (19, 20), may be more appropriate than the historical approach of calculating percentage predicted lung function. Indeed, a risk-based approach recently demonstrated that an FEV1/FVC ratio < 0.70 was preferred for the diagnosis of COPD (21), and studies also suggest that both earlier U.S. (30) and current race/ethnicity–based spirometry equations may underestimate risk for all-cause mortality in Black compared with White groups (42–44), although one study was limited by potential site-by-race confounding and restricted to two racial/ethnic groups. Our present findings in this multicenter study designed for multiethnic comparisons and using standardized spirometry with validated clinical endpoints suggest that inclusion of terms for race/ethnicity in contemporary (GLI and NHANES III) spirometry reference equations provides no benefit for risk prediction or discriminative accuracy of CLRD-related events and mortality.
Literature on racism in medicine suggests that inclusion of race/ethnicity in spirometry reference equations is problematic for several reasons and may cause harm. Race-based clinical algorithms can further exacerbate racial inequities by directing more resources toward White patients (45). They discount the role of social determinants of health and early environmental factors in health outcomes (45–47), and they reinforce implicit racial bias, which can unknowingly alter treatment decisions for Black patients (48). Race/ethnicity can also be challenging to apply: classifications can be subject to inaccuracy of perceived, nonself-reported, and other definitions of race/ethnicity; many countries are increasingly multiethnic, with large numbers of patients identifying with more than one race (49), and race/ethnicity is largely socially constructed and regionally defined, such that constructs in one country do not apply to another country, and additional differences might exist within countries (50).
More directly, the negative coefficient for Black race in currently recommended race/ethnicity–based equations (4) increased the percentage predicted values for Black compared with White race by an average of 17 percentage points of the percentage predicted FEV1 and 18 percentage points of the percentage predicted FVC. For example, a 65-year-old man with height of 168 cm, FEV1 of 2.1 L, and FVC of 2.8 L has a percentage predicted FEV1 of 70% if White and 82% if Black and has a percentage predicted FVC of 72% if White and 85% if Black, such that the Black individual may be less likely to be treated with long-acting therapies that reduce COPD hospitalizations, referred for further evaluation for pulmonary fibrosis, or diagnosed with an occupational lung disease. Our results suggest that these disparities may be true at a population level, and the literature reports that, for example, Black patients with COPD are less likely to be on pharmacologic treatment (51) and to receive guideline-recommended pulmonary rehabilitation programs (52), both which are indicated partly on the basis of percentage predicted FEV1 criteria.
Although this is study is unique, to our knowledge, as a multiethnic prospective cohort study designed to minimize site-by-race confounding, it has several limitations. Sample sizes were modest within strata of race/ethnicity, and estimates for Asian participants from generalized additive models were unreliable because of small numbers. Nonetheless, 95% confidence intervals were extremely tight for all major comparisons, effectively ruling out a clinical benefit for discriminative accuracy of race/ethnicity–based equations compared with race/ethnicity–neutral equations and demonstrating adequate statistical power for all the primary comparisons, including those within strata of racial/ethnic groups. We used self-reported race and ethnicity, allowing for possible discrepancies in classification in a multiethnic society. Genetic ancestry might improve lung function estimates for Black Americans (53) and may modify risk for lung diseases (54), but the overall contribution of genetic ancestry to lung function compared with environmental exposures is modest. CLRD-related events included only patients requiring hospitalization or who died, limiting our ability to capture milder exacerbations and symptoms or those not coded according to CLRD-related International Classification of Diseases codes or keywords (33) and may be subject to treatment effects. Only prebronchodilator lung function was available from 2004 to 2006, but current reference equations were also defined on the basis of prebronchodilator spirometry, and secular changes are unlikely to alter the findings. We did not examine diffusing capacity or lung volumes, given the focus on spirometry reference equations, and it is unknown if our results would apply to these measures. Finally, all-cause mortality is nonspecific, but the CLRD-based endpoint is similar to that used in clinical trials in COPD.
In conclusion, we found no evidence that currently recommended race/ethnicity–based reference equations for percentage predicted FEV1 and FVC improved the prediction of CLRD-related events or mortality compared with race/ethnicity–neutral equations. The inclusion of race/ethnicity in spirometry reference equations should be reconsidered, potentially with the application of contemporary, prospective, risk-based approaches to spirometry, as is standard for other major chronic diseases.
The authors thank the other investigators, the staff, and the participants of the MESA study for their valuable contributions. A full list of participating MESA investigators and institutions can be found at http://www.mesa-nhlbi.org.
1. | Hutchinson J. On the capacity of the lungs, and on the respiratory functions, with a view of establishing a precise and easy method of detecting disease by the spirometer. Med Chir Trans 1846;29:137–252. |
2. | Gould BA. Investigations in the military and anthropological statistics of American soldiers. New York: Hurd & Houghton; 1869. |
3. | Braun L. Breathing race into the machine: the surprising career of the spirometer from plantation to genetics. Minneapolis: University of Minnesota Press; 2014. |
4. | Quanjer PH, Stanojevic S, Cole TJ, Baur X, Hall GL, Culver BH, et al.; ERS Global Lung Function Initiative. Multi-ethnic reference values for spirometry for the 3-95-yr age range: the global lung function 2012 equations. Eur Respir J 2012;40:1324–1343. |
5. | Graham BL, Steenbruggen I, Miller MR, Barjaktarevic IZ, Cooper BG, Hall GL, et al.; Standardization of spirometry 2019 update. An official American Thoracic Society and European Respiratory Society technical statement. Am J Respir Crit Care Med 2019;200:e70–e88. |
6. | Culver BH, Graham BL, Coates AL, Wanger J, Berry CE, Clarke PK, et al.; ATS Committee on Proficiency Standards for Pulmonary Function Laboratories. Recommendations for a standardized pulmonary function report. An official American Thoracic Society technical statement. Am J Respir Crit Care Med 2017;196:1463–1472. |
7. | Global Initiative for Chronic Obstructive Lung Disease. Global strategy for the diagnosis, management, and prevention of chronic obstructive pulmonary disease. Fontana, WI: Global Initiative for Chronic Obstructive Lung Disease; 2021. |
8. | Raghu G, Collard HR, Egan JJ, Martinez FJ, Behr J, Brown KK, et al.; ATS/ERS/JRS/ALAT Committee on Idiopathic Pulmonary Fibrosis. An official ATS/ERS/JRS/ALAT statement: idiopathic pulmonary fibrosis: evidence-based guidelines for diagnosis and management. Am J Respir Crit Care Med 2011;183:788–824. |
9. | Bateman ED, Hurd SS, Barnes PJ, Bousquet J, Drazen JM, FitzGerald JM, et al. Global strategy for asthma management and prevention: GINA executive summary. Eur Respir J 2008;31: 143–178. |
10. | Townsend MC. Spirometry in occupational health—2020. J Occup Environ Med 2020;62:e208–e230. |
11. | Harik-Khan RI, Fleg JL, Muller DC, Wise RA. The effect of anthropometric and socioeconomic factors on the racial difference in lung function. Am J Respir Crit Care Med 2001;164:1647–1654. |
12. | Hsi BP, Hsu KH, Jenkins DE. Ventilatory functions of normal children and young adults: Mexican-American, White, and Black. III. Sitting height as a predictor. J Pediatr 1983;102:860–865. |
13. | Barker DJ, Godfrey KM, Fall C, Osmond C, Winter PD, Shaheen SO. Relation of birth weight and childhood respiratory infection to adult lung function and death from chronic obstructive airways disease. BMJ 1991;303:671–675. |
14. | Ejike CO, Dransfield MT, Hansel NN, Putcha N, Raju S, Martinez CH, et al. Chronic obstructive pulmonary disease in America’s Black population. Am J Respir Crit Care Med 2019;200: 423–430. |
15. | Lovasi GS, Diez Roux AV, Hoffman EA, Kawut SM, Jacobs DR Jr, Barr RG. Association of environmental tobacco smoke exposure in childhood with early emphysema in adulthood among nonsmokers: the MESA-lung study. Am J Epidemiol 2010;171:54–62. |
16. | Martinez FD. Early-life origins of chronic obstructive pulmonary disease. N Engl J Med 2016;375:871–878. |
17. | Nardone A, Casey JA, Morello-Frosch R, Mujahid M, Balmes JR, Thakur N. Associations between historical residential redlining and current age-adjusted rates of emergency department visits due to asthma across eight cities in California: an ecological study. Lancet Planet Health 2020;4:e24–e31. |
18. | Smith BM, Kirby M, Hoffman EA, Kronmal RA, Aaron SD, Allen NB, et al.; MESA Lung, CanCOLD, and SPIROMICS Investigators. Association of dysanapsis with chronic obstructive pulmonary disease among older adults. JAMA 2020;323:2268–2280. |
19. | American Diabetes Association. 2. Classification and diagnosis of diabetes: standards of medical care in diabetes—2018. Diabetes Care 2018;41:S13–S27. |
20. | Stone NJ, Robinson JG, Lichtenstein AH, Bairey Merz CN, Blum CB, Eckel RH, et al.; American College of Cardiology/American Heart Association Task Force on Practice Guidelines. 2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation 2014;129:S1–S45. |
21. | Bhatt SP, Balte PP, Schwartz JE, Cassano PA, Couper D, Jacobs DR Jr, et al. Discriminative accuracy of FEV1:FVC thresholds for COPD-related hospitalization and mortality. JAMA 2019;321:2438–2447. |
22. | Burney PG, Hooper R. Forced vital capacity, airway obstruction and survival in a general population sample from the USA. Thorax 2011;66:49–54. |
23. | Elmaleh-Sachs Balte PP, Oelsner E, Allen NB, Baugh A, Bertoni A et al. Are race and ethnicity necessary to define ‘normal’ lung function? The Multi-Ethnic Study of Atherosclerosis (MESA) Lung Study [abstract]. Am J Respir Crit Care Med 2021;203:A1028. |
24. | Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, et al. Multi-Ethnic Study of Atherosclerosis: objectives and design. Am J Epidemiol 2002;156:871–881. |
25. | Rodriguez J, Jiang R, Johnson WC, MacKenzie BA, Smith LJ, Barr RG. The association of pipe and cigar use with cotinine levels, lung function, and airflow obstruction: a cross-sectional study. Ann Intern Med 2010;152:201–210. |
26. | Centers for Disease Control and Prevention. Chronic Obstructive Pulmonary Disease (COPD) includes: chronic bronchitis and emphysema [updated April 2021; accessed 2021 Sept 13]. Available from: https://www.cdc.gov/nchs/fastats/copd.htm. |
27. | Miller MR, Hankinson J, Brusasco V, Burgos F, Casaburi R, Coates A, et al.; ATS/ERS Task Force. Standardisation of spirometry. Eur Respir J 2005;26:319–338. |
28. | Hankinson JL, Kawut SM, Shahar E, Smith LJ, Stukovsky KH, Barr RG. Performance of American Thoracic Society-recommended spirometry reference values in a multiethnic sample of adults: the Multi-Ethnic Study of Atherosclerosis (MESA) lung study. Chest 2010;137: 138–145. |
29. | Kiefer EM, Hankinson JL, Barr RG. Similar relation of age and height to lung function among Whites, African Americans, and Hispanics. Am J Epidemiol 2011;173:376–387. |
30. | Hankinson JL, Odencrantz JR, Fedan KB. Spirometric reference values from a sample of the general U.S. population. Am J Respir Crit Care Med 1999;159:179–187. |
31. | Stampfer MJ, Willett WC, Speizer FE, Dysert DC, Lipnick R, Rosner B, et al. Test of the National Death Index. Am J Epidemiol 1984;119:837–839. |
32. | Oelsner EC, Carr JJ, Enright PL, Hoffman EA, Folsom AR, Kawut SM, et al. Per cent emphysema is associated with respiratory and lung cancer mortality in the general population: a cohort study. Thorax 2016;71:624–632. |
33. | Oelsner EC, Loehr LR, Henderson AG, Donohue KM, Enright PL, Kalhan R, et al. Classifying chronic lower respiratory disease events in epidemiologic cohort studies. Ann Am Thorac Soc 2016;13: 1057–1066. |
34. | World Health Organization. International Classification of Diseases, 10th ed. Geneva, Switzerland: World Health Organization; 2010. |
35. | Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA 1982;247:2543–2546. |
36. | Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361–387. |
37. | Kang L, Chen W, Petrick NA, Gallas BD. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach. Stat Med 2015;34:685–703. |
38. | Kouri A, Dandurand RJ, Usmani OS, Chow CW. Exploring the 175-year history of spirometry and the vital lessons it can teach us today. Eur Respir Rev 2021;30:210081. |
39. | Raghu G, Rochwerg B, Zhang Y, Garcia CA, Azuma A, Behr J, et al.; American Thoracic Society; European Respiratory Society; Japanese Respiratory Society; Latin American Thoracic Association. An official ATS/ERS/JRS/ALAT clinical practice guideline: treatment of idiopathic pulmonary fibrosis. An update of the 2011 clinical practice guideline. Am J Respir Crit Care Med 2015;192:e3–e19. |
40. | Weill D, Benden C, Corris PA, Dark JH, Davis RD, Keshavjee S, et al. A consensus document for the selection of lung transplant candidates: 2014—an update from the Pulmonary Transplantation Council of the International Society for Heart and Lung Transplantation. J Heart Lung Transplant 2015;34:1–15. |
41. | Aaron SD, Dales RE, Cardinal P. How accurate is spirometry at predicting restrictive pulmonary impairment? Chest 1999;115: 869–873. |
42. | Burney PG, Hooper RL. The use of ethnically specific norms for ventilatory function in African-American and White populations. Int J Epidemiol 2012;41:782–790. |
43. | Gaffney AW, McCormick D, Woolhandler S, Christiani DC, Himmelstein DU. Prognostic implications of differences in forced vital capacity in Black and White US adults: findings from NHANES III with long-term mortality follow-up. EClinicalMedicine 2021;39:101073. |
44. | McCormack MC, Balasubramanian A, Matsui EC, Peng R, Wise RA, Keet CA. Race, lung function and long-term mortality in the National Health and Examination Survey III. Am J Respir Crit Care Med [online ahead of print] 2021 Oct 1; DOI: 10.1164/rccm. 202104-0822LE. |
45. | Vyas DA, Eisenstein LG, Jones DS. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N Engl J Med 2020;383:874–882. |
46. | Baugh ABI, Barr RG, Criner GJ, Comellas AP, Cooper CB, Couper D, et al.; SPIROMICS Investigators. Does race adjustment in pulmonary function testing obscure important disparities? [abstract]. Am J Respir Crit Care Med 2020;201:A7573. |
47. | Bhakta NR, Kaminsky DA, Bime C, Thakur N, Hall GL, McCormack MC, Stanojevic S. Addressing race in pulmonary function testing by aligning intent and evidence with practice and perception. Chest [online ahead of print] 2021 Aug 24; DOI: 10.1016/j.chest.2021.08.053. |
48. | Chapman EN, Kaatz A, Carnes M. Physicians and implicit bias: how doctors may unwittingly perpetuate health care disparities. J Gen Intern Med 2013;28:1504–1510. |
49. | Liebler CA, Porter SR, Fernandez LE, Noon JM, Ennis SR. America’s churning races: race and ethnicity response changes between census 2000 and the 2010 census. Demography 2017;54:259–284. |
50. | LaVange L, Davis SM, Hankinson J, Enright P, Wilson R, Barr RG, et al. Spirometry reference equations from the HCHS/SOL (Hispanic Community Health Study/Study of Latinos). Am J Respir Crit Care Med 2017;196:993–1003. |
51. | Martin A, Badrick E, Mathur R, Hull S. Effect of ethnicity on the prevalence, severity, and management of COPD in general practice. Br J Gen Pract 2012;62:e76–e81. |
52. | Spitzer KA, Stefan MS, Priya A, Pack QR, Pekow PS, Lagu T, et al. A geographic analysis of racial disparities in use of pulmonary rehabilitation after hospitalization for COPD exacerbation. Chest 2020;157:1130–1137. |
53. | Kumar R, Seibold MA, Aldrich MC, Williams LK, Reiner AP, Colangelo L, et al. Genetic ancestry in lung-function predictions. N Engl J Med 2010;363:321–330. |
54. | Grossman NL, Ortega VE, King TS, Bleecker ER, Ampleford EA, Bacharier LB, et al. Exacerbation-prone asthma in the context of race and ancestry in Asthma Clinical Research Network trials. J Allergy Clin Immunol 2019;144:1524–1533. |
Supported by NHLBI grants R01-HL077612, R01-HL093081, and R01-HL130506. Additional support was provided by Health Resources and Services Administration grant T32HP10260. MESA was supported by NHLBI contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D0006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, and N01-HC-95169 and by National Center for Advancing Translational Sciences grants UL1-TR-000040, UL1-TR-001079, and UL1-TR-001420.
Author Contributions: A.E.-S.: literature search, data analysis, data interpretation, drafting of manuscript, and preparation of tables; P.B.: study design, data analysis, data interpretation, drafting of manuscript, and preparation of tables and figures; E.C.O. and N.B.A.: study design, data collection, data interpretation, and manuscript review; A.B.: data interpretation and manuscript review; A.G.B.: study design, data collection, data interpretation, and manuscript review; J.L.H.: study design, data interpretation, and manuscript review; J.P., W.S.P., J.E.S., B.M.S., and K.W.: study design, data collection, data interpretation, and manuscript review; R.G.B.: study design, funding, data collection, data interpretation, and drafting and review of the manuscript.
This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org.
Originally Published in Press as DOI: 10.1164/rccm.202107-1612OC on December 16, 2021
Author disclosures are available with the text of this article at www.atsjournals.org