The objective of this study was to determine the limits for repeatability of FEV1, FVC, and PEF during spirometry test sessions in adult outpatients. A retrospective chart review of 18,000 consecutive patients, aged 20 to 90 years, referred to a large outpatient pulmonary function laboratory for testing was performed. Measurements included the differences between the highest and second-highest FVC (dFVC), FEV1 (dFEV1), and PEF (dPEF), from prebronchodilator spirometry, and anthropometric factors. Ninety percent of the patients were able to reproduce FEV1 within 120 ml (6.1%), FVC within 150 ml (5.3%), and PEF within 0.80 L (12%). Patient characteristics, such as sex, age, height, smoking status, and FEV1 (% predicted), had very little effect on repeatability, explaining only 2 to 4% of the variation in repeatability (expressed in milliliters). We conclude that the ability of patients to meet or exceed spirometry repeatability goals does not depend on patient characteristics when testing is performed by experienced personnel. The current American Thoracic Society repeatability goal of 200 ml for FEV1 and FVC may be too lenient.
It is important to try to achieve good repeatability (reproducibility) of FEV1 and FVC within a spirometry test session because poor repeatability reduces confidence in the interpretation of bronchodilator or methacholine response (short-term) and long-term (month-to-month or year-to-year) changes in lung function. For this reason, the American Thoracic Society (ATS) recommends that procedural sources of variation in lung function be minimized (1). On the other hand, trying to meet overly stringent criteria for within–test session repeatability is frustrating for technicians and patients alike, and lengthens test time.
Guidelines for the performance of spirometry have been based on published analyses of thousands of spirometry tests done by experienced technicians to ensure that the repeatability goals are practical. The current ATS criteria for satisfactory spirometry (2) are based on the 90th percentile values obtained during a large, population-based survey, the Third National Health and Nutrition Examination Survey (3). The ATS standard states that after three acceptable maneuvers are performed, the two largest FEV1s should match within 200 ml (the difference between the highest and second-highest FEV1 within the prebronchodilator spirometry test session [dFEV1]), and the two largest FVCs should also match within 200 ml (the difference between the highest and second-highest FVC within the prebronchodilator spirometry test session [dFVC]). If both of these repeatability criteria are not met, then additional maneuvers should be performed in an attempt to achieve better repeatability (up to a total of eight maneuvers).
The current European Thoracic Society criteria for satisfactory spirometry (4) states that the chosen (largest) values should not exceed the next highest by more than 5% or 100 ml, whichever is greater. The European Thoracic Society criteria was chosen to agree with the earlier 1987 ATS spirometry standards (5). The European Thoracic Society also suggests that “as a useful criterion” the two highest PEFs match within 10% (the difference between the highest and second-highest PEF within the prebronchodilator spirometry test session [dPEF]).
Most spirometry testing is done for patients with pulmonary problems and not for general population samples, so the current study was performed to determine the within–test session spirometry repeatability of adult patients (many sick and elderly), who were referred to a large pulmonary function laboratory for testing.
Since 1992, all results from pulmonary function testing done at the outpatient pulmonary function laboratory of the Mayo Clinic in Rochester, MN have been stored in a database. The analyses for this manuscript were performed on a subset of that database, obtained by a search for all spirometry results performed on patients aged 20 to 90 years from January 1, 1996 to December 31, 2000.
Spirometry was performed by 16 technicians with considerable full-time pulmonary function testing experience, who were certified by the American Association of Respiratory Care. Nine spirometers were used, all of the same model (Medical Graphics 1085 desktop system; St Paul, MN). This spirometer uses a screen pneumotach (Hans Rudolph model 3813; Kansas City, MO), which is heated to 37°C to prevent condensation forming on the screen. The pneumotachometer is located at the end of a 3-foot-long breathing tube. Spirometry test procedures conformed to 1995 ATS standards.
The accuracy of each spirometer was checked daily, using a 3-liter calibration syringe, emptied at three different speeds. Testing was allowed on a given spirometer only after the measured volume errors were less than 3%. The patients were vigorously coached by a technician to perform forced expirations until three acceptable maneuvers (or a maximum of eight) were recorded. Acceptability was determined according to ATS recommendations. A color VGA monitor displayed a real-time tracing of exhaled flow versus volume, which was viewed by the subject and technician.
Descriptive statistics were calculated for results from the three best maneuvers from each prebronchodilator test session, and for dFVC, dFEV1, and dPEF.
To identify significant influences on performance, multiple regression analyses were performed on each performance-quality variable. The initial regressions included continuous independent variables for age, height, and percent-predicted FEV1 as well as dichotomous independent variables for male sex and smoking status (ever vs. never).
Of all 18,526 patients tested, 52% were male, 54.8% reported ever smoking, 22.8% reported episodes of shortness of breath with wheezing during the prior 12 months, 11.0% gave a history of physician-diagnosed asthma, and 9.5% reported having emphysema or chronic obstructive pulmonary disease. The ranges of height, age, and impairment of lung function were very wide (see Table 1)
The ability of men and women to obtain reproducible FEV1s, FVCs, and peak flows was almost identical when expressed as percent difference. Only 5% of the patients were unable to match their highest FEV1 within 150 ml (see Table 2)
Age did not affect repeatability in any of the models. Sex, height, smoking status, and the degree of lung function impairment (percent-predicted FEV1) explained less than 10% of the variance in the ability of patients to reproduce their spirometry values (see the R2 values in Table 3)
In general, the spirometry quality of the adult patients in our study compared favorably with results reported by other investigators. We believe that relatively stringent within–test session repeatability goals for the key spirometry variables FEV1 and FVC are important because they improve confidence in the diagnostic discrimination of the test and the confidence in which changes in lung function may be interpreted by the physician who ordered the test.
We believe that the “gold standard” by which repeatability goals are determined should be based on the ability of highly experienced technicians, using optimal quality instruments, to meet the goals in 9 of every 10 patients when testing a wide variety of patients referred for pulmonary function testing. Most of the technicians who tested the patients in this study had performed spirometry as their primary responsibility for many years, and the spirometers were very well maintained. The very large number of patients, with widely varying age, height, smoking status, and degree of lung disease, make the results of this study highly generalizable.
The amount of variance in the models (Table 3) was always higher for repeatability expressed as a percentage. The use of a percentage goal makes it more difficult for patients who are short or female to meet the goal (see Table 4)
Nine of every 10 patients could match their highest FEV1 within 120 ml (see Table 2, the 90th percentile for dFEV1), within 150 ml for FVC, and within 0.80 L/second for peak flow. These results suggest that the 1995 ATS recommendations (2) for spirometry repeatability goals (dFEV1 and dFVC < 200 ml) are too lenient for adults.
The Third National Health and Nutrition Examination Survey was a large study of a general population sample (3), with a mean dFEV1 of 56 ml for women and 65 ml for men, almost identical to the results from our patients. The investigators recommended a goal of less than 200 ml, which was met by 94% of their subjects, regardless of height or sex. Our results are almost identical to this study of mostly healthy individuals, but we suggest that the goal be set so that 90% of patients will pass when tested by an experienced technician, instead of 95%.
Patients with moderate to severe impairment of lung function (as measured by FEV1 percent predicted) exhibited more within–test session variability, when expressed as a percentage, than those with normal lung function (see Table 4 for examples). On the other hand, when variability (dFEV1) was expressed as an absolute value (in milliliters), those with low lung function showed slightly less variability (for example, a mean of 42 ml vs. 58 ml for those with an FEV1 of 100% predicted).
A large α-1 antitrypsin disease registry at 37 sites (6) demonstrated that only 2% of the patients (mean FEV1 of 42% predicted) failed the earlier ATS goal of dFEV1 less than 100 ml or 5% (whichever was greater). A logistic regression model predicting failure to achieve dFEV1 less than 5% and 100 ml showed that age, sex, and pack-years of smoking were not independent predictors but that site (technician) and percent-predicted FEV1 were highly significant predictors. Patients with mild or very severe airway obstruction (FEV1 < 20% predicted) were more likely to achieve reproducible FEV1s than those with moderate obstruction.
A very large study of young men in Norway found that 9.5% failed the dFEV1 goals of less than 5% or less than 100 ml, and this failure was more common in shorter men, older men, never smokers, and those with respiratory symptoms (7). A population-based study of 416 young adults (8) noted that young men with bronchial hyperresponsiveness and young women who were cigarette smokers were more likely than others to fail a dFEV1 goal of less than 100 ml. Our model for dFEV1 (milliliters) shows that smokers were only slightly more likely to have a larger dFEV1 (by an average of 3.5 ml when compared with never smokers). About 12% of their subjects failed that goal (an almost identical percentage as the patients in our study).
A study of 864 employees (mean age 45) found that workers with lower lung function, as well as older workers, were less able to obtain FEV1s matching within 5% (9), but there was no association with smoking status. We found no significant effect of age on any of the repeatability variables, but current and former smokers had a slightly higher mean dFEV1 and a slightly higher mean dFVC.
It has been suggested that it is often more difficult for children to perform reproducible spirometry maneuvers when compared with a population-based sample of generally healthy adults. However, a recent study of more than 4,000 children and adolescents (ages 9–18), tested by experienced technicians (10), showed that 90% of the children had a dFEV1 less than 117 ml and a dFVC less than 98 ml, easily doing better than the current ATS repeatability criteria of 200 ml.
Despite the association between cognitive function and the ability to perform reproducible spirometry in elderly persons (11), a population-based study of 5,201 persons aged 65 years and more, tested by 16 different technicians, demonstrated that only 3% could not match FEV1s within 200 ml (12). However, repeatability was not as good for a subsequent population-based study of elderly black subjects (13).
The need for PEF repeatability criteria during spirometry tests is less important than for FEV1 and FVC because PEF and time to peak flow are used primarily as indices of the effort of patients to blast out the air quickly during the first 100 to 200 milliseconds of the maneuver and not for detecting airway obstruction. The 1995 ATS recommendations stated that “Although there may be some benefit from using PEF repeatability to improve subject effort, no specific [PEF] repeatability criterion is recommended at this time.” Coates and coworkers found that in children with asthma or cystic fibrosis tested by hospital-based pulmonary function laboratory technicians, dFEV1 was much more closely associated with variation in the FVC due to variation in the depth of inhalation preceding the FVC maneuver than with variation in peak flow (dPEF) (14). However, if PEF repeatability is used, a dPEF goal of either 0.8 L/seconds or 16% is reasonable because the influence of patient characteristics is nearly the same for the absolute difference and the percent difference.
Repeatability criteria should be first used by technicians as a goal while performing spirometry, performing additional maneuvers (up to a total of 8) in an attempt to obtain a good match between the highest and second-highest values obtained. After the second and subsequent maneuvers, the spirometer's software should display the repeatability of acceptable maneuvers (or a quality grade, from A to F, based on acceptability and repeatability). After testing is completed, the degree of repeatability is valuable to grade the quality of the test session, as done in research studies (15–17) and recommended for office spirometry (18).
This study used spirometers with flow sensors, so flow was integrated digitally to obtain FEV1 and FVC. The majority of spirometers now sold also use pneumotachographs instead of accumulating volume, so our results should apply to them. The screen pneumotach has a very low thermal mass, so successive exhalation maneuvers are highly unlikely to change the temperature of the air flowing through them. Volume spirometers, and some other types of flow sensors, change temperature throughout a test session, so they should use temperature sensors to automate the volume corrections for each maneuver (19). Otherwise, meeting within–test session repeatability goals is likely to be more difficult.
Nine of every 10 adult outpatients can successfully be coached by an experienced spirometry technician to obtain FEV1s and FVCs that are reproducible within 120 to 150 ml. The presence or severity of lung disease, young or old age, sex, or smoking status should not be used as an excuse for inability to obtain reproducible spirometry maneuvers. We believe that the ability to meet or exceed repeatability goals depends largely on the skill and perseverance of the technician and very little on patient characteristics.
These analyses could not have been performed without the pulmonary function database system established with considerable foresight by Drs. Joseph Rodarte and Robert Hyatt. The high quality of the tests is due to the experience, patience, and skills of the pulmonary function technicians of the Mayo Clinic in Rochester, Minnesota.
|1.||Becklake M, Crapo RO, Buist AS. Lung function testing: selection of reference values and interpretative strategies: an official statement of the American Thoracic Society. Am Rev Respir Dis 1991;144:1202–1218.|
|2.||Standardization of spirometry, 1994 update: American Thoracic Society. Am J Respir Crit Care Med 1995;152:1107–1136.|
|3.||Hankinson JL, Bang KM. Acceptability and repeatability criteria of the ATS for spirometry as observed in a sample of the general population. Am Rev Respir Dis 1991;143:516–521.|
|4.||Quanjer PH, Tammeling GJ, Cotes JE, Pedersen OF, Peslin R, Yernault JC. Lung volumes and forced ventilatory flows: report of working party, standardization of lung function tests. Eur Respir J 1993;6:5–40.|
|5.||Standardization of spirometry, 1987 update: American Thoracic Society. Am Rev Respir Dis 1987;136:1285–1293.|
|6.||Stoller J, Buist A, Burrows B, Crystal RG, Fallat RJ, McCarthy K, Schluchter MD, Soskel NT, Zhang R. Quality control of spirometry testing in the registry for patients with severe alpha-1-antitrypsin deficiency. Chest 1997;111:899–909.|
|7.||Humerfelt S, Eide GE, Kvale G, Gulsvik A. Predictors of spirometric test failure: a comparison of the 1983 and 1993 acceptability criteria from the ECCS. Occup Environ Med 1995;52:547–553.|
|8.||Ng'ang'a LW, Ernst P, Jaakkola MS, Gerardi G, Hanley JH, Becklake MR. Spirometric lung function: distribution and determinants of test failure in a young adult population. Am Rev Respir Dis 1992;145:48–52.|
|9.||Neale AV, Demers RY. Significance of the inability to reproduce pulmonary function test results. J Occup Med 1994;36:660–666.|
|10.||Enright PL, Linn WS, Avol EL, Margolis H, Gong H, Peters JM. Spirometry quality in children and adolescents: experience in a large field study. Chest 2000;118:665–671.|
|11.||Sherman CB, Kern D, Richardson ER, Hubert M, Fogel BS. Cognitive function and spirometry performance in the elderly. Am Rev Respir Dis 1993;148:123–126.|
|12.||Enright PL, Kronmal RA, Higgins M, Schenker M, Haponik EF. Spirometry reference values for women and men 65–85 years of age: cardiovascular health study. Am Rev Respir Dis 1993;147:125–133.|
|13.||Enright PL, Arnold A, Manolio TA, Kuller LH. Spirometry reference values for healthy elderly blacks. Chest 1996;110:1416–1424.|
|14.||Coates AL, Desmond KJ, Demizio D, Allen PD. Sources of variation in FEV1. Am J Respir Crit Care Med 1994;149:439–443.|
|15.||Enright PL, Johnson LR, Connett JE, Voelker H, Buist AS. Spirometry in the Lung Health Study: methods and quality control. Am Rev Respir Dis 1991;143:1215–1223.|
|16.||Banks DE, Wang ML, McCabe L, Billie M, Hankinson J. Improvement in lung function measurements using a flow spirometer that emphasizes computer assessment of test quality. J Occup Environ Med 1996;279–283.|
|17.||Malstrom K, Peszek I, Botto A, Lu S, Enright PL, Reiss TF. Quality assurance of asthma clinical trials. Control Clin Trials 2002;23:143–156.|
|18.||Ferguson G, Enright PL, Buist AS, Higgins M. Office spirometry for lung health assessment in adults: an consensus statement from the National Lung Health Education Program. Chest 2000;117:1146–1161.|
|19.||Johnson LR, Enright PL, Voelker HT, Tashkin DP. Volume spirometers need automated internal temperature sensors. Am J Respir Crit Care Med 1994;150:1575–1580.|