American Journal of Respiratory and Critical Care Medicine

To measure the reliability of chest radiographic diagnosis of acute respiratory distress syndrome (ARDS) we conducted an observer agreement study in which two of eight intensivists and a radiologist, blinded to one another's interpretation, reviewed 778 radiographs from 99 critically ill patients. One intensivist and a radiologist participated in pilot training. Raters made a global rating of the presence of ARDS on the basis of diffuse bilateral infiltrates. We assessed interobserver agreement in a pairwise fashion. For rater pairings in which one rater had not participated in the consensus process we found moderate levels of raw (0.68 to 0.80), chance-corrected ( κ 0.38 to 0.55), and chance-independent ( Φ 0.53 to 0.75) agreement. The pair of raters who participated in consensus training achieved excellent to almost perfect raw (0.88 to 0.94), chance-corrected ( κ 0.72 to 0.88), and chance-independent ( Φ 0.74 to 0.89) agreement. We conclude that intensivists without formal consensus training can achieve moderate levels of agreement. Consensus training is necessary to achieve the substantial or almost perfect levels of agreement optimal for the conduct of clinical trials. Meade MO, Cook RJ, Guyatt GH, Groll R, Kachura JR, Bedard M, Cook DJ, Slutsky AS, Stewart TE. Interobserver variation in interpreting chest radiographs for the diagnosis of acute respiratory distress syndrome.

Acute respiratory distress syndrome (ARDS) is an advanced form of acute lung injury characterized by diffuse pathophysiological changes of increased capillary permeability, inflammation, and tissue repair. The clinical syndrome includes a triad of hypoxemia, decreased lung compliance, and chest radiographic abnormalities. The high incidence of ARDS in patients with predisposing clinical conditions such as sepsis, gastric aspiration, and multiple trauma [up to 35% (1)], and the associated mortality of 20 to 74% (2, 3), have made ARDS a major concern for clinicians and investigators.

ARDS is a more severe form of the continuum of acute lung injury, the threshold for which is somewhat arbitrary. Because of the arbitrariness of this threshold, defining ARDS and identifying the syndrome in individual patients presents a challenge. This problem of definition has led to considerable difficulties in comparing epidemiologic data relating to ARDS incidence (4), difficulties that will be resolved only through multinational collaboration (5). Variability in defining and identifying ARDS is also an important concern in clinical trails that consider ARDS as an inclusion criterion or as a study outcome.

Whatever definition one chooses, the diagnosis of ARDS depends in part on identifying characteristic radiographic abnormalities. To be consistently useful, interpretation of a radiological investigation must be reliable. Highly desirable in the delivery of clinical care, reliability becomes crucial in clinical studies that rely on radiologic findings. Lack of reproducibility will inflate required samples sizes, and potentially lead to false-negative trial results. The limited interobserver agreement that investigators have usually observed when examining radiographic interpretation (6-9) suggests that both clinicians and scientists should attend to this issue.

We conducted a multicenter randomized trial of a pressure- and volume-limited ventilation strategy (10) in patients at high risk for ARDS. Our study demonstrated similar outcomes for the alternative strategies that we examined. When planning this study, we considered using ARDS as a possible inclusion criterion, and as a possible outcome. We rejected ARDS as an inclusion criterion because we ultimately decided our intervention might prevent the development of ARDS. We rejected ARDS as an outcome since the two different ventilation strategies used different mean airway pressures, and hence would potentially bias chest radiograph and oxygenation criteria of ARDS. We were nevertheless interested in the frequency with which ARDS occurred, and in the reliability with which we might measure that frequency. We therefore examined the extent to which intensive care physicians and a radiologist could agree on the radiologic diagnosis of ARDS.

Source of Chest Radiographs

We used films from patients enrolled in our randomized trial at seven participating hospitals in Toronto (Ontario, Canada). Adult patients who met the following criteria were eligible for the trial: intubated less than 24 h; peak airway pressures ⩽ 30 cm H2O; hypoxemia: PaO2 / Fi O2 < 250, on positive end-expiratory pressure (PEEP) = 5 cm H2O; one or more known risk factors for ARDS. The trial excluded patients with the following characteristics: anticipated duration of ICU admission < 48 h; very unlikely survival, defined by premorbid or acute life expectancy; heart failure; acute asthmatic exacerbation; high risk of cardiac arrhythmia or ischemia; intracranial abnormalities associated with intracranial hypertension; or pregnancy.

Because of difficulty in obtaining the films, we omitted all films from the Ottawa center; from the Toronto Hospitals, we omitted films that film library personnel could not locate. Study patients had chest radiographs taken at least once daily, and we chose the first film from each consecutive day of participation. We included 841 films from 99 patients; individual patients provided from 1 to 32 films (median, 7).

Raters

Three raters interpreted each radiograph. Seven study intensivist/investigators, one from each participating hospital, provided the first interpretation by reading films done at their hospital (Rater 1). After the randomized trial was completed, two other raters, one an intensivist (M.M., Rater 2) and one a radiologist (J.K., Rater 3) interpreted each film, independently and without knowledge of other interpretations.

Preparation of Chest Radiographs

Site investigators reviewed films at the time they were taken. To prepare the films for review by Raters 2 and 3, we shuffled them in batches of approximately 150 as they arrived at the study office, numbered them in their new sequence, removed each film from the associated envelope, and covered the identification label with an opaque sticker bearing the study film number. The purpose of these preparations was to minimize bias that might occur if raters reviewed serial films from a single patient in sequence.

Review Process

Site investigators recorded their interpretations of study films on data forms included in the randomized trial. Site investigators had no study-specific training (no study-specific definitions or standardized techniques) in judging the presence of ARDS-related infiltrates. Raters 2 and 3 began by independently interpreting 63 films from 11 patients—we refer to these films as the “training set.” Raters 2 and 3 then repeated the review of the training set films, this time with one another, discussing the reasons for disagreement and refining the standards and rules they would apply when the interpretation was difficult. We refer to this process as the “standardized review.” Raters 2 and 3 then completed their interpretation of the full sample of 841 films, including the 63 films from the training set. The training set films were included at random among the others and the raters were thus unlikely to identify them.

Radiograph Interpretation

Each interpreter made two ratings in accordance with two definitions for ARDS that are commonly used in clinical trials. One rating, based on the definition of ARDS provided by an American–European Consensus Conference (AECC) statement (5), involved deciding whether a chest radiograph had diffuse bilateral infiltrates. Although the original AECC statement specifies “bilateral infiltrates” in the list of criteria defining ARDS, we specified “diffuse bilateral infiltrates” for two reasons. First, the AECC statement included a discussion of ARDS as a diffuse process that is therefore associated with diffuse infiltrates. Second, we were unwilling to interpret films with discrete bilateral subsegmental infiltrates as being consistent with ARDS. The refined criteria arising from the standardization process included conventional definitions from the radiology literature (11), defining infiltrate as “any ill-defined opacity in the lung that neither destroys nor displaces the gross morphology of the lung and is presumed to represent a pathophysiological process.” The refined criteria also included defining diffuse as widespread and continuous, by which the reviewers meant involving at least 80% of a lung field and not excluding specific lung segments.

The other rating, based on the Lung Injury Severity Score (4), involved deciding how many quadrants contained an area of consolidation. The consensus standardization led to a definition of consolidation as “a homogeneous opacity in the lung characterized by little or no loss of volume, by effacement of pulmonary blood vessels, and sometimes by the presence of an air bronchogram” and excluded definite effusions and masses. To distinguish between the upper and lower quadrant of a lung field, Raters 2 and 3 agreed to use the horizontal plane of the ipsilateral pulmonary artery at its midpoint at the hilum. When this landmark was obscured, they used the contralateral pulmonary artery and, when both were obscured, they used the midpoint of the height of the lung fields.

Statistical Methods

We were interested in the level of agreement in each of the three possible pairings among Raters 1, 2, and 3. We refer to the pairing of Raters 2 and 3 as the “standardized pair” to distinguish this pairing from other pairings in which one rater (the site intensivist/investigator) had not participated in the consensus process.

We measured agreement among raters by addressing two questions. The first question, Is this chest radiograph consistent with ARDS?, is relevant to clinical practice or to the use or ARDS as an inclusion criterion in clinical trials. To address this issue, we would like as many films as possible. Therefore, it would be convenient if we could treat the 841 films from this study as if they were 841 films from different patients. However, we have serial films from 99 patients; therefore, we cannot treat our observations as if they came from different patients (using technical languages, as if they were independent). If we did assume independence, results from standard κ-type analyses could be subject to major distortion. We will return to this issue shortly.

The second question, Did this patient develop ARDS?, applies to the use of ARDS as a study outcome. Measuring agreement among raters in this setting requires reviewing all films for each patient, and developing a criterion for a series of films being consistent with ARDS. We tested two possible criteria: (1) any film consistent with ARDS, and (2) films on two consecutive days consistent with ARDS.

Because seven intensivists contributed to “Rater 1,” we began by comparing odds ratios of agreement between each of the seven and Raters 2 and 3 with respect to the presence of diffuse bilateral infiltrates on Day 1, two consecutive days, or any day. Testing failed to reject the null hypothesis, i.e., that the seven intensivists achieved the same levels of agreement with Raters 2 and 3. Testing for heterogeneity of odds ratios generated by different observers and pooling across estimates if no heterogeneity is found is standard statistical methodology (12).

For comparisons of rating of the presence or absence of diffuse bilateral infiltrates, we calculated raw agreement, chance-corrected agreement (using κ), and chance-independent agreement (using Φ). Table 1 presents the formulas for our measures of agreement based on a 2 × 2 table. The rationale for using these three methods is as follows. Raw agreement—the proportion of films in which both raters conclude that diffuse infiltrates were, or were not, present—can be misleading. In particular, if two raters both make a high or low proportion of positive ratings, raw agreement will be high even if the raters are just guessing. That is, their agreement will be high simply by chance. High agreement by chance tends to occur when two observers believe the prevalence of the clinical entity of interest is high or low in the population under study.

Table 1. CALCULATIONS OF AGREEMENT*

Rater B: Infiltrates PresentRater B: Infiltrates Absent
Rater A: Infiltrates Presentab
Rater A: Infiltrates Absentcd

Definition of terms: Raw agreement: (a + d)/(a + b + c + d) κ: (observed agreement − expected agreement)/(1.0 − expected agreement)  where observed agreement = (a + d)/(a + b + c + d)  and expected agreement = [(a + b) (a + c)/(a + b + c + d)] + [(c + d) (b + d)/   (a + b + c + d)] Odds ratio (OR): ad/bc Φ: [(OR)1/2 − 1]/[(OR)1/2 + 1] = [(ad)1/2 − (bc)1/2]/[(ad)1/2 + (bc)1/2]

Because of this problem with raw agreement, we calculated chance- corrected agreement, using the κ statistic (13). While avoiding spuriously high levels of agreement due to chance, κ has its own limitations that have led to sharp criticism (14). One of the major difficulties with κ is that when the proportion of positive ratings is extreme, the possible agreement above chance agreement is small, and it is difficult to achieve even moderate values of κ. Thus, if one uses the same raters in a variety of settings, as the proportion of positive ratings becomes extreme, κ will decrease even if the way the raters interpret films does not change.

To address this limitation, we also calculated chance-independent agreement using Φ, a relatively new approach to assessing observer agreement (15). One begins by estimating the odds ratio from a 2 × 2 table displaying the agreement between two observers, such as the one presented in Table 1. The odds ratio is given by OR = ad/bc. In this case it is simply the odds of a positive classification by rater B when rater A gives a positive classification divided by the odds of a positive classification by rater B when rater A gives a negative classification. As such, it provides a natural measure of agreement. This agreement can be made more easily interpretable by converting it into a form that takes values from −1.0 (representing extreme disagreement) to 1.0 (representing extreme agreement). The Φ statistic makes this conversion by the following formula:

Φ=(OR)1/21(OR)1/2+1=(ad)1/2(bc)1/2(ad)1/2+(bc)1/2 Equation 1

When both margins are 0.5 (that is, both raters conclude that 50% of the patients are positive and 50% negative for the trait of interest) Φ is equal to κ.

Φ has three important advantages over existing approaches. First, it is independent of the level of chance agreement. Thus, investigators could expect to find similar levels of Φ whether the distribution of results is 50% positive and 50% negative, or 90% positive and 10% negative. This is not true for measures of the κ statistic, a chance-corrected index of agreement.

Second, Φ allows modeling approaches that the κ statistic does not. For example, in the present data set, because of the possible lack of independence in degree of agreement across multiple films from a single patient, κ would not allow us to take full advantage of the 841 films that our raters evaluated. Φ allowed us to adjust for the degree of intrapatient correlation in assessments of serial radiographs, and thus make more efficient use of the data and generate narrower confidence intervals around the level of agreement. Third, Φ also allowed us to test whether differences in agreement between pairings were significant, an option not available with κ.

For ratings of the presence or absence of diffuse bilateral infiltrates, we compared not only the agreement between the three pairings of raters, but also the agreement between the standardized pairing (Raters 2 and 3) on the training set before and after the standardized review. Because of the possibility that viewing the training set twice may have influenced the standardized raters' interpretation of those films, we omitted them from the primary comparisons. Thus, the primary comparisons of the three possible pairings of raters included only 778 films.

As we have mentioned, we could not use κ to calculate agreement on the presence or absence of diffuse bilateral infiltrates using all films, because of the lack of independence in multiple films on the same patients. We were able to assess agreement across all 778 films based on the Φ statistic and applied maximum likelihood estimation based on the noncentral hypergeometric distribution to generate estimates that account for the degree of correlation in multiple films coming from the same patient (the Appendix describes the approach to maximum likelihood estimation).

We also conducted significance tests on the agreement between the three pairings of raters on the three ratings of the presence of radiographic ARDS (bilateral infiltrates present on first film, any film, and two consecutive films), and on the agreement between the consensus raters before and after training. We interpreted both κ and Φ results as follows: values of less than 0, poor; 0 to 0.2, slight; 0.2 to 0.4, fair agreement; 0.4 to 0.6, moderate agreement; 0.6 to 0.8, substantial agreement; and values of 0.8 to 1.0 represent almost perfect agreement (16).

Methods for calculating chance-independent agreement with multiple categories—in this case multiple quadrants—remain undeveloped. Therefore, to assess agreement between the three pairings of raters on the rating of consolidation in 0 to 4 quadrants, we relied on weighted κ with quadratic weights allowing for partial agreement (17). We have explained why, because of lack of independence, we could not use all films for assessing κ, and thus used the new methodology for chance independent agreement. Because we did not have an equivalent methodology to deal with multiple quadrants, we used only the first film on each patient to address agreement on the rating of the number of quadrants involved.

The patients contributed from 1 to 33 films each to the agreement process, with a mean of 8.9. The seven intensivists who contributed to the “Rater 1” comparisons evaluated films from between 3 and 27 patients. The proportion of patients judged by Raters 1, 2, and 3, respectively, to have bilateral infiltrates present on Day 1 were 0.54, 0.27, and 0.30; 0.70, 0.60, and 0.61 for the proportion of patients who had diffuse bilateral infiltrates present on any day; and 0.64, 0.40, and 0.41 for the proportion of patients who had diffuse bilateral infiltrates present on two consecutive days. For the seven intensivists who contributed to the Rater 1 ratings, the proportions of patients with diffuse bilateral infiltrates present on Day 1, any day, or two consecutive days, respectively, ranged from 0 to 0.78, 0.20 to 0.91, and 0.20 to 0.83. The rater with the 0 and 0.2 proportions reviewed films from only 5 patients.

Table 2A to 2C presents the agreement across the three pairings of raters for the three approaches to judging bilateral infiltrates present or absent, using raw agreement, κ and Φ. Agreement between Raters 2 and 3 was substantial to almost perfect for all three criteria, using all three approaches. Raw agreement between the other two pairings varied from 0.68 to 0.80. The agreement between these two pairings was moderate for all three criteria, using κ, and moderate to substantial using Φ.

We were interested in whether the consistent trend showing higher agreement in the standardized pairing could be a chance phenomenon. While methods for testing the statistical significance of two κ values in this situation have not been developed, the methodology of chance-independent agreement allows this comparison. Despite the consistency of the trend toward greater agreement in the standardized pairing, the difference between the levels of agreement approached conventional levels of significance for only one of the three ratings related to bilateral infiltrates (p values of 0.83, 0.05, and 0.12 for first film positive, any film positive, and two consecutive films positive by Raters 2 and 3 versus 1 and 3; 0.24, 0.91, and 0.95 by Raters 2 and 3 versus 1 and 2).

This lack of significance could be a problem of power—we may not have had enough films to exclude chance as an explanation. This problem could be ameliorated by using all of the films. Using all films, however, requires adjustment for any lack of independence in ratings of multiple films from the same individual. Including all films in the evaluation of diffuse infiltrates and adjusting for lack of independence, the Φ for Raters 2 and 3 was 0.69 (95% CI, 0.60–0.77), for Raters 1 and 2 the Φ was 0.60 (95% CI, 0.44–0.72), and for 1 and 3 the Φ was 0.56 (95% CI, 0.41–0.69). The difference in these levels of Φ was highly significant (p values of < 0.001 comparing Raters 2 and 3 with either 1 and 2, or 1 and 3).

Table 3 addresses the hypothesis that the reason for the superior agreement of Raters 2 and 3 and the other pairs was the consensus process Raters 2 and 3 undertook in reviewing the first 63 films together. Table 3 presents the level of agreement related to the presence of bilateral infiltrates before and after the consensus process. While the number are small, there is a strong trend for a higher level of agreement after the consensus process. Here, the small data set leads to empty cells (cells with 0 observations), which makes it difficult to make meaningful calculations of Φ.

Table 3. CHANCE-CORRECTED AGREEMENT BETWEEN RATERS 2 AND 3 ON THE TRAINING SET OF 63 CHEST RADIOGRAPHS BEFORE AND AFTER STANDARDIZED REVIEW*

Before StandardizationAfter Standardization
DBI
 First day0.35 (0.0–0.74) 1.00
 Any day0.21 (0.0–0.59)1.00
 Two consecutive days0.48 (0.05–0.91)0.63 (0.17–1.0)

Definition of abbreviation: DBI = diffuse bilateral infiltrates.

* As explained in text, we did not calculate Φ statistics because empty cells made meaningful calculations impossible.

Numbers in parentheses represent the 95% confidence intervals.

  When κ is 1.0, one cannot calculate a confidence interval.

The weighted κ for the number of quadrants involved in the first film of each patient was as follows for the three pairings: Raters 2 and 3, 0.74 (95% CI, 0.63–0.85); Raters 1 and 2, 0.47 (95% CI, 0.31–0.63); Raters 1 and 3, 0.54 (95% CI, 0.42– 0.67).

We found moderate to good agreement on the presence of diffuse bilateral infiltrates suggestive of ARDS, irrespective of which of a number of possible criteria we used. This level of agreement is high in comparison with most clinical ratings, and many of the radiographic interpretations, that clinicians use regularly in clinical practice. For instance, the intensivists in our study demonstrated considerably better agreement than did those who participated in a prior study of interpretation of chest radiographs of patients with ARDS. Beards and coworkers found a κ of only 0.05 for intensivists' rating of the number of quadrants in which consolidation was present (18).

In the clinical trial setting, however, agreement that is less than excellent compromises precision of measurement, and may result in misleading findings, large sample size requirements, or both. For instance, consider a trial enrolling patients with established ARDS, in which the presence of bilateral infiltrates would constitute one criterion for inclusion. The site intensivist and Rater 2 (the study intensivist) agreed on 68% of the ratings of the presence of bilateral infiltrates in the first film from each patient (Table 2A). This limited level of agreement would lead to appreciable differences in the patients enrolled in the study. Similarly, if a study considered ARDS as an outcome and the presence of diffuse infiltrates at any time while the patients stayed in the ICU contributed to the diagnosis, intensivists' ratings would agree only 78% of the time (Table 2A). This limited level of agreement could contribute substantial random error to the study results.

Fortunately, there is a partial solution to this problem. Development of standardized criteria and reporting forms; pilot testing; and training of raters through review of disagreements, discussion of the reasons, and agreement about how to deal with difficult judgments are accepted methods of maximizing agreement in a wide variety of clinical ratings. These methods have resulted in acceptable levels of agreement in interpretation of pediatric chest radiographs in a multicenter study (19). We have provided empirical evidence of the magnitude of improved agreement that clinical trialists studying radiological findings in critically ill patients can achieve by modest pilot testing and consensus development. This process decreased the disagreement on the presence of infiltrates on the first film of each patient to 10% and on any film to 8% (Table 2A).

Strengths of this study include the careful blinding of the radiographs, and of the raters, to one another's interpretation; the participation of both intensivists and a radiologist; the relatively large number of films read and the resulting relatively narrow confidence intervals; and our rigorous approach to data analysis. The study would have been stronger yet if we had the resources to include more radiologists and intensivists, and conducted a more systematic evaluation of a training period that would allow raters to develop consensus standards. The intensivist who read each film was a critical care fellow at the time of the study. Stage of training might have influenced the degree of improvement with training, and including additional readers at varying stages of training would have allowed us to explore this issue.

Inferences from our study may be limited by the lack of detail and explicitness in the current definitions of ARDS (5, 11). Available guidelines for reading and interpreting chest radiographs in patients receiving mechanical ventilation do not solve this problem, as they too offer only general approaches rather than explicit criteria (20). As a result, we developed our own detailed criteria; our criteria, however, do not have the benefit of a wide consensus. Ongoing work is likely to ameliorate or solve this problem in the future.

In reporting our results, we have relied on an innovative approach to measuring agreement with binary ratings. Like traditional measures of agreement, the Φ statistic takes values from −1.0 to 1.0. As we have described in Methods, Φ has three important advantages over existing approaches. First, it is independent of the level of chance agreement. Second, Φ allows full use of information from nonindependent observations (in this case, multiple films from each patient). Third, Φ allows testing of whether variations in agreement between different pairings of the same raters are significant. These options are not available with κ. We believe these advantages of Φ may ultimately lead to its replacing κ as the standard measure of agreement for binary clinical ratings. Until we gain further experience with the new method, however, we suggest investigators report both the standard κ and the Φ statistic.

In summary, we have demonstrated that intensivists can achieve moderate levels of agreement in the radiologic diagnosis of ARDS without specific training. Further consensus training can increase the level of agreement to substantial or almost perfect. Clinicians involved in clinical trials should seriously consider pilot training and assessment of the level of agreement in making clinical and radiographic ratings to enhance the power and accuracy of their studies.

Table 2. AGREEMENT BETWEEN RATERS ON ASSESSING THE PRESENCE OF DIFFUSE BILATERAL INFILTRATES*

Rater 1 versus 2Rater 1 versus 3Rater 2 versus 3
A. Raw agreement
DBI
 First day0.68 (0.58–0.78) 0.72 (0.62–0.82)0.89 (0.82–0.96)
 Any day0.78 (0.69–0.87)0.80 (0.71–0.89)0.88 (0.81–0.95)
 Two consecutive days0.73 (0.63–0.83)0.72 (0.62–0.82)0.94 (0.89–0.99)
B. Chance-corrected agreement (κ)
DBI
 First day0.38 (0.22–0.54)0.47 (0.31–0.63)0.72 (0.56–0.88)
 Any day0.53 (0.35–0.71)0.55 (0.37–0.73)0.74 (0.60–0.88)
 Two consecutive days0.49 (0.33–0.65)0.46 (0.29–0.63)0.88 (0.78–0.98)
C. Chance-independent agreement (Φ)
DBI
 First day0.59 (0.29–0.79)0.73 (0.39–0.89)0.75 (0.56–0.86)
 Any day0.53 (0.31–0.69)0.54 (0.32–0.71)0.74 (0.57–0.86)
 Two consecutive days0.75 (0.42–0.90)0.66 (0.42–0.90)0.89 (0.73–0.95)

Definition of abbreviation: DBI = diffuse bilateral infiltrates.

* As described in text, Rater 1 refers to the site-specific intensivist investigators. Rater 2 refers to an intensivist who reviewed all films. Rater 3 refers to a radiologist who reviewed all films.

Numbers in parentheses represent the 95% confidence intervals.

Supported in part by the Physicians' Services Incorporated Foundation of Ontario, the Ontario Thoracic Society, and Bayer Corporation.

1. Garber B. G., Hebert P. C., Yelle J. D., Hodder R. V., McGowan J.Adult respiratory distress syndrome: a systematic overview of incidence and risk factors. Crit. Care Med.241996687695
2. Miller R. S., Nelson L. D., Di Russo S. M., Rutherford E. J., Safcsak K., Morris J. A.High-level positive end-expiratory pressure management in trauma-associated adult respiratory distress syndrome. J. Trauma331992284291
3. Bell R. C., Coalson J. J., Smith J. D., Johanson W. G.Multiple organ system failure and infection in adult respiratory distress syndrome. Ann. Intern. Med.991983293298
4. Murray J., Matthay M., Luce J., Flick M.An expanded definition of the adult respiratory distress syndrome. Am. Rev. Respir. Dis.1351988720723
5. Bernard G. R., Artigas A., Brigham K. L., Carlet J., Falke K., Hudson L., Lamy M., LeGall J. R., Morris A., Spragg R.Report of the American–European consensus conference on acute respiratory distress syndrome: definitions, mechanisms, relevant outcomes, and clinical trial coordination. Am. J. Respir. Crit. Care Med.1491994818824
6. Tudor G. R., Finlay D., Taub N.An assessment of inter- observer agreement and accuracy when reporting plain radiographs. Clin. Radiol.521997235238
7. Guyatt G. H., Lefcoe M., Walter S. D., Griffith L. E., King D., Zylak C., Hickey N., Carrier G.Interobserver variation in computerized tomographic diagnosis of intrathoracic lymphadenopathy in patients with potentially resectable lung cancer. Chest1071995116119
8. Maguire W. M., Herman P. G., Kahn A., Simon-Gabor M., Cruz V., Eacobacci T. M.Interobserver agreement using computed radiography in the adult intensive care unit. Acad. Radiol.119941014
9. Bloomfield F. H., Teele R. L., Voss M., Knight D. B., Harding J. E.Inter- and intra-observer variability in the assessment of atelectasis and consolidation in neonatal chest radiographs. Ped. Radiol.291999459462
10. Stewart T. E., Meade M. O., Cook D. J., Granton J. T., Hodder R. V., Lapinsky S. E., Mazer C. D., McLean R. F., Rogovein E. S., Schouten B. D., Todd T. R. J., Slutsky A. S.Evaluation of a ventilation strategy to prevent barotrauma in patients at high risk for acute respiratory distress syndrome. N. Engl. J. Med.3381998355361
11. Fraser, R. G., J. A. Peter Pare, P. D. Pare, R. S. Fraser, and G. P. Genereux. 1988. Diagnosis of Diseases of the Chest, 3rd ed. W.B. Saunders, Philadelphia. xiii–xx.
12. Breslow, N. E., and N. E. Day. 1980. Statistical Methods in Cancer Research, Vol. 1: The Analysis of Case-control Studies. International Agency for Cancer Research.
13. Fleiss J. L.Measuring nominal scale agreement among many raters. Psychol. Bull.761971378382
14. McClure M., Willett W. C.Misinterpretation and misuse of the kappa statistic. Am. J. Epidemiol.1261987161169
15. Cook R. J., Farewell V. T.Conditional inference for subject-specific and marginal agreement: two families of agreement measures. Can. J. Stat.231995333344
16. Landis J. R., Koch G. G.The measurement of observer agreement for categorical data. Biometrics331977159174
17. Cohen J.Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull.701968213220
18. Beards S. C., Jackson A., Hunt L., Wood A., Frerk C. M., Brear G., Edwards J. D., Nightingale P.Interobserver variation in the chest radiograph component of the lung injury score. Anaesthesia501995928932
19. Cleveland R. H., Schlucter M., Wood B. P., Berdon W. E., Boechat M. I., Easley K. A., Meziane M., Mellins R. B., Norton K. I., Singleton E., Trautwein L.Chest radiograph data acquisition and quality assurance in multicentre studies. Pediatr. Radiol.271997880887
20. Winer-Muram H. T., Rubin S. A., Miniati M., Ellis J. V.Guidelines for reading and interpreting chest radiographs in patients receiving mechanical ventilation. Chest102(Suppl.)1992565S570S
Correspondence and requests for reprints should be addressed to Thomas E. Stewart, M.D., Department of Medicine, Mount Sinai Hospital, Suite 427-600, University Avenue, Toronto, ON, Canada M5G 1X5. E-mail:

Related

No related items
American Journal of Respiratory and Critical Care Medicine
161
1

Click to see any corrections or updates and to confirm this is the authentic version of record