American Journal of Respiratory and Critical Care Medicine

Rationale: Some clusters of patients who have Mycobacterium tuberculosis isolates with identical DNA fingerprint patterns grow faster than others. It is unclear what predictors determine cluster growth.

Objectives: To assess whether the development of a tuberculosis (TB) outbreak can be predicted by the characteristics of its first two patients.

Methods: Demographic and clinical data of all culture-confirmed patients with TB in the Netherlands from 1993 through 2004 were combined with DNA fingerprint data. Clusters were restricted to cluster episodes of 2 years to only detect newly arising clusters. Characteristics of the first two patients were compared between small (2–4 cases) and large (5 or more cases) cluster episodes.

Measurements and Main Results: Of 5,454 clustered cases, 1,756 (32%) were part of a cluster episode of 2 years. Of 622 cluster episodes, 54 (9%) were large and 568 (91%) were small episodes. Independent predictors for large cluster episodes were as follows: less than 3 months' time between the diagnosis of the first two patients, one or both patients were young (<35 yr), both patients lived in an urban area, and both patients came from sub-Saharan Africa.

Conclusions: In the Netherlands, patients in new cluster episodes should be screened for these risk factors. When the risk pattern applies, targeted interventions (e.g., intensified contact investigation) should be considered to prevent further cluster expansion.

Scientific Knowledge on the Subject

Individual risk factors for clustering of DNA fingerprints of M. tuberculosis isolates are well known. It is currently unknown whether the development of an outbreak of tuberculosis can be predicted from patients that are assigned to the same cluster.

What This Study Adds to the Field

Large cluster episodes can be predicted by patient characteristics. Screening of such characteristics can thus be used as an early warning system for the detection and prevention of large tuberculosis outbreaks.

Tuberculosis (TB) mainly results from Mycobacterium tuberculosis transmitted through coughing of patients. By DNA typing, one can distinguish different strains of M. tuberculosis (1). Patients sharing an identical M. tuberculosis strain are considered to be part of a “cluster,” reflecting recent transmission of M. tuberculosis and rapid progression to disease from recent exogenous infection. Unique DNA fingerprint patterns are assumed to be due to reactivation of remote infections or recent transmission from patients outside the study period or study area (2, 3). Outbreaks of TB occur regularly (4), as evidenced by clustering. Outbreaks could result from failure of contact investigations to detect all contacts and treat those with a recent infection.

Population-based studies in low-incidence countries have identified individual risk factors for involvement in a cluster (2, 58). Patients in clusters are more often male, young, of certain nationalities, long-term residents in low-endemic countries, urban residents, sputum smear positive, HIV infected, drug or alcohol abusers, or homeless.

Other studies have tried to identify risk factors for being the first patient in a cluster and the generation of secondary cases (9, 10). However, these studies assumed that the first patient diagnosed was the source case, which is not necessarily true when patient presentation delay is long. More probable is that the source case will be among the first two patients in a cluster.

Although risk factors for an individual to be part of or give rise to a cluster have been assessed (2, 510), predictors of further cluster growth have not. Risk factors that predict further cluster growth are relevant for TB control as they may predict outbreaks. Early identification of clusters that potentially become large could help focus TB control efforts, especially in low-incidence countries that are approaching the elimination phase of TB. The aim of our study was therefore to determine which characteristics of the first two cases in a cluster can predict the development of large clusters (of 5 or more cases). Some of the results of this study have been reported previously in the form of an abstract (11, 12).

Data Collection

We combined data from the Netherlands Tuberculosis Register (NTR), which includes demographic and clinical information of all patients with diagnosed TB, with data from the National Institute of Public Health and the Environment (RIVM) that include information on species identification, molecular typing, and drug susceptibility from all M. tuberculosis isolates in the Netherlands. Because the NTR is an anonymous register that includes routinely gathered surveillance data, no ethical approval was required for the study. Patients with culture-confirmed TB from January 1, 1993, through December 31, 2004, in both registers were matched on the basis of sex, date of birth, year of diagnosis, and postal code. Patients were included when data in both registers matched completely, or if one minor difference existed in one of the matching variables. For a mismatch in the year of diagnosis, only one calendar year difference between diagnosis (NTR) and isolation (RIVM) was tolerated. Duplicate matches were excluded.

DNA Fingerprinting

M. tuberculosis isolates from all patients with culture-confirmed TB were subjected to IS6110 restriction fragment length polymorphism (RFLP) typing (13). Isolates with four or fewer IS6110 bands were subtyped using the polymorphic GC-rich sequence (PGRS) as a probe (14). Isolates with identical IS6110 RFLP patterns were assigned to a cluster. Isolates with four bands or less also had to have identical PGRS RFLP patterns to define a cluster. Isolates were subjected to spoligotyping to determine the genotype family (15). The Beijing genotype was defined according to the previously published definition (16). The Haarlem genotype was defined according to IS6110 RFLP and spoligotype patterns (17) and sequencing of the ogt gene (18). Computer-assisted analysis of IS6110-PGRS RFLP and spoligotyping patterns was done using Bionumerics software, version 4.0 for Windows (Applied Maths, Sint-Maartens-Latem, Belgium) and visually checked.

Selection of Cluster Episodes

Because we were interested in newly arising clusters, or clusters that arose after a period during which no cases with that particular DNA fingerprint had been diagnosed, we assigned clustered cases to episodes. A cluster episode included all cases who had isolates with identical DNA fingerprints that occurred within 2 years after the first case in that cluster had been diagnosed, and after a period of 2 years during which no other cases with that same DNA fingerprint had been diagnosed (Figure 1). We used a 2-year timeframe because clusters that grow large within this short period are most relevant for public health and most infected persons who develop active TB will do so within 2 years after infection (19). Cluster episodes before 1995 were excluded because DNA fingerprint results from the period before 1993 were not available, and we were therefore unable to determine if a prior period of 2 years without any clustered cases preceded index cases before 1995. Likewise, cluster episodes could not start after 2003 because these could not be followed for 2 years. Cluster episodes were considered large when they had five or more cases, and small when they had two to four cases within a 2-year period.

Selection of the First Two Cases

The first two cases of a cluster episode were selected according to their date of diagnosis. Characteristics of the first two cases were counted for each cluster episode. Missing values were counted as 0 as we assumed that, in these cases, the risk factor was not present. Consequently, for each cluster episode, the characteristics of interest in the first two cases were coded as either present in both (2), present in one (1), or absent (0).

Statistical Analysis

Demographic and clinical characteristics were analyzed as possible predictors. An urban area was defined as one of the four largest cities of the Netherlands (>250,000 inhabitants). Multidrug resistance (MDR) was resistance to at least isoniazid and rifampicin. Nationalities were grouped into continents in which countries of the former Soviet Union were part of Asia and Western countries were represented by Australia, North America, and European countries other than the Netherlands. In addition three non-Dutch nationalities with the highest absolute number of patients with TB (Turkish and Moroccan) or highest incidence (Somalian) in the Netherlands (20) were studied separately. For the date of diagnosis, we relied on the judgment of the TB physician. The time between the diagnosis of the first and second case was calculated.

The relative count of each factor was compared between large and small clusters using logistic regression. Those predictors for which the count differed (P ⩽ 0.25) were included in the multivariate logistic regression model. Our model was determined by use of the forward stepwise likelihood ratio methods with a significance level of 0.10. The predictive value of our model was determined by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The ROC curve plots sensitivity and false-positive (1-specificity) rate at all possible cutoff points of the predicted probability. The predicted probability is determined by the combination of characteristics of the first two patients in a cluster episode and the corresponding risks (odds ratios). The optimal point of the ROC curve, where the sum of sensitivity and corresponding specificity was at its maximum, corresponds with a certain probability. For the comparison of continuous variables that were not normally distributed, the Mann-Whitney U test was used, as appropriate.

Sensitivity Analysis

To determine whether the ability to predict large clusters differed when the duration of a cluster episode was prolonged or when the number of cases in a large cluster episode decreased or increased, we performed two sensitivity analyses. First, the duration of a cluster was extended to 3 years. Second, the definition of a large cluster episode was changed to include at least four or at least six cases within 2 years. Statistical analyses were performed using SPSS version 14.0.1 for Windows (SPSS, Inc., Chicago, IL).

From 1993 through 2004, 18,200 patients with TB were reported to the NTR, 12,457 (68%) of whom had culture-confirmed TB. Of the culture-confirmed cases, 10,567 (85%) could be matched between the two datasets. Of these, 9,024 (85%) had complete agreement in the matching variables, whereas another 1,543 (15%) had a minor difference in one of the matching variables (Figure 2). No difference between matched and nonmatched patients was found regarding sex, age group, and nationality. Patients who were detected passively matched slightly more often (78%) compared with those found actively (65%).

Of the matched cases, 5,454 (52%) were clustered, representing 1,168 different DNA fingerprint patterns (Figure 2). In total, 622 cluster episodes of 2 years were identified comprising 1,756 of 5,454 (32%) cases. Five hundred and forty-two DNA fingerprint patterns were found in a single cluster episode, and 40 were found in two cluster episodes.

The number of cases per cluster episode ranged from 2 through 20 (Figure 3). Of the 622 cluster episodes, 568 were small (91.3%) and 54 (8.7%) large. In Table 1, characteristics of cases in cluster episodes are shown and compared with cases that were clustered but not involved in a cluster episode to assess the possibility of selection bias. Because the number of cases assessed was very large, we considered a difference of more than 10% between the two groups relevant. Cases in cluster episodes had less often unknown information in several variables, were more often from Asia, and more often had TB caused by an M. tuberculosis strain of the Beijing genotype or one that was resistant to rifampicin.



No. of Clustered Cases Not in Episode

No. of Clustered Cases in Episode* (%)

Odds Ratio for Being in Episode (95% CI)

P Value
 Male2,3731,079 (31.3)1
 Female1,325677 (33.8)1.12 (1.00–1.26)
Age, yr0.014
 0–1413481 (37.7)1.22 (0.91–1.62)
 15–341,899944 (33.2)1
 35–541,085441 (28.9)0.82 (0.71–0.94)
 55+580290 (33.3)1.01 (0.86–1.18)
 PTB2,147989 (31.5)1
 ETB1,155564 (32.8)1.06 (0.94–1.20)
 PTB+ETB387200 (34.1)1.12 (0.93–1.35)
 Unknown93 (25.0)0.72 (0.20–2.58)
ZN result (sputum or BAL)<0.001
 Positive1,478746 (35.1)1
 Negative465266 (36.4)1.06 (0.89–1.26)
 Not done/unknown1855744 (28.6)0.74 (0.66–0.84)
Previous TB0.950
 No2,9911,421 (32.2)1
 Yes326158 (32.6)1.02 (0.89–1.25)
 Unknown381177 (31.7)0.98 (0.81–1.18)
HIV infection0.06
 No3,5101,687 (32.5)1
 Yes18869 (26.8)0.76 (0.58–1.01)
Case finding0.068
 Passive2,8871,389 (32.5)1
 Contact tracing293156 (34.7)1.11 (0.90–1.36)
 Screening (risk group/work)367161 (30.5)0.91 (0.75–1.11)
 Unknown15150 (24.9)0.69 (0.50–0.95)
Place of residence0.196
 Urban1,449656 (31.2)1
 Village2,2491,100 (32.8)1.08 (0.96–1.22)
High-risk group<0.01
 No/unknown3,2371,617 (33.3)1
 Yes461139 (23.2)0.60 (0.50–0.74)
Country of origin<0.001
 The Netherlands1,671705 (29.7)1
 Central and South America15555 (26.2)0.84 (0.61–1.16)
 Asia356242 (40.5)1.61 (1.34–1.94)
 Western countries13969 (33.2)1.18 (0.87–1.59)
 Sub-Saharan Africa883483 (35.4)1.30 (1.13–1.49)
 North Africa467190 (28.9)0.96 (0.80–1.17)
 Unknown2712 (30.8)1.05 (0.53–2.09)
 No/unknown3,5281,669 (32.1)1
 Yes17087 (33.9)1.08 (0.83–1.41)
 No/unknown3,2641,575 (32.5)1
 Yes434181 (29.4)0.86 (0.72–1.04)
 No/unknown3,1391,485 (32.1)1
 Yes559271 (32.7)1.03 (0.88–1.20)
Beijing genotype<0.001
 No3,5221,619 (31.5)1
 Yes176137 (43.8)1.69 (1.34–2.13)
Haarlem genotype<0.001
 No2,8351,452 (33.9)1
 Yes863304 (26.0)0.69 (0.60–0.80)
Isoniazid resistance0.002
 No3,4391,610 (31.9)1
 Yes221139 (38.6)1.34 (1.08–1.68)
 Unknown387 (15.6)0.39 (0.18–0.88)
Rifampicin resistance0.017
 No3,6251,724 (41.7)1
 Yes3525 (32.2)1.50 (0.90–2.52)
 Unknown387 (15.6)0.39 (0.18–0.87)
Multidrug resistance§0.045
 No3,6311,732 (32.3)1
 Yes2917 (37.0)1.23 (0.67–2.24)
 Unknown387 (15.6)0.39 (0.17–0.87)
Total number
1,756 (32.2)

Definition of abbreviations: BAL = bronchoalveolar lavage; CI = confidence interval; EPTB = extrapulmonary tuberculosis; PTB = pulmonary tuberculosis; TB = tuberculosis; ZN = Ziehl-Neelsen staining.

*Row percentages are given.

High-risk groups are illegal immigrants, alcohol and/or drug abusers, and/or prisoners.

Western countries include Europe (excluding the Netherlands), Australia, and North America.

§Multidrug resistance was defined as resistance to at least isoniazid and rifampicin.

In Table 2, characteristics of the first two cases are shown that are associated with large cluster episodes. Univariate analysis showed that time between the first two patients was significantly shorter in large clusters than in small clusters. In 36 of 54 large clusters (67%), the first two cases were diagnosed within a period of 3 months, compared with 150 of 568 (26%) in small clusters. One or both early cases in large clusters were more often young (age < 35 yr). Furthermore, it was more common that both first cases of large cluster episodes lived in an urban setting. HIV infection and MDR were both more often present in the first two cases of large clusters compared with small clusters (P = 0.051 and P = 0.039, respectively). The mean time between onset of symptoms and health seeking (patient delay) of the first patient was 13.7 weeks for first cases in large clusters compared with 8.9 weeks for those in small clusters (Mann-Whitney U test P = 0.432, known in 423 [68%] of all cluster episodes).


Characteristics of the First Two Cases in a Cluster Episode

No. of Large Clusters (%)*

No. of Small Clusters

Odds Ratio for Large Cluster (95% CI)

P Value
 None8 (7.3)1021
 1 male18 (6.7)2500.92 (0.39–2.18)
 Both males28 (11.5)2161.65 (0.73–3.75)
Age below 35 yr0.001
 None4 (2.5)1541
 1 case below 35 yr27 (11.8)2025.15 (1.73–15.0)
 Both cases below 35 yr23 (9.8)2124.18 (1.42–12.3)
ZN result (sputum or BAL)0.136
 None12 (5.9)1901
 1 case with ZN-positive TB31 (11.0)2511.96 (0.98–3.91)
 Both cases have ZN-positive TB11 (8.0)1271.37 (0.59–3.20)
Contact investigation0.185
 None or unknown43 (7.9)5021
 1 case detected by contact tracing10 (13.7)631.85 (0.89–3.87)
 Both cases detected by contact tracing1 (25.0)33.89 (0.40–38.2)
Urban residence0.021
 None21 (6.4)3051
 1 case lives in urban14 (8.2)1561.30 (0.65–2.63)
 Both cases live in urban19 (15.1)1072.58 (1.34–4.98)
Same nationality (1st and 2nd case)0.095
 No or unknown27 (11.0)2181
 Yes27 (7.2)3500.62 (0.36–1.09)
Turkish nationality0.095
 None or unknown51 (8.9)5221
 1 Turkish case3 (12.0)221.40 (0.40–4.82)
 Both cases are Turkish0 (0)24
Sub-Saharan African nationality0.112
 None or unknown30 (7.1)3941
 1 case is from sub-Saharan Africa7 (11.5)541.70 (0.71–4.07)
 Both cases are from sub-Saharan Africa17 (12.4)1201.86 (1.00–3.49)
Asian nationality0.058
 None or unknown47 (9.7)4391
 1 case is Asian6 (7.6)730.77 (0.32–1.86)
 Both cases are Asian1 (1.8)560.17 (0.02–1.23)
HIV infection
 None or unknown46 (7.9)53310.051
 1 case with HIV infection7 (17.9)322.54 (1.06–6.06)
 Both cases are HIV infected1 (25.0)33.86 (0.39–37.9)
 None51 (8.3)5611
 1 case with MDR-TB1 (14.3)61.83 (0.22–15.5)
 Both cases have MDR-TB2 (66.7)122.0 (1.96–247)
Time between diagnosis 1st and 2nd case, mo<0.001
 0 to <336 (19.4)1508.45 (3.23–22.1)
 3 to <65 (4.2)1141.54 (0.44–5.45)
 6 to <128 (5.9)1282.20 (0.70–6.88)
 12–245 (2.8)1761
Total number of cluster episodes
54 (8.7)
568 (91.3)

Definition of abbreviations: BAL = bronchoalveolar lavage; CI = confidence interval; MDR = multidrug resistance; ZN = Ziehl-Neelsen staining.

Only variables with a P < 0.25 were included in this table.

*Row percentages are given.

MDR was defined as resistance to at least isoniazid and rifampicin.

Multivariate logistic regression revealed four significant independent predictors for large clusters: a period of less than 3 months between the date of diagnosis of the first two cases, young age of one or both, both living in an urban area, and both coming from sub-Saharan Africa (Table 3). We did not take MDR into account in the multivariate model because the number of MDR-TB cases was small. The discriminative ability of this multivariate model is shown by the ROC curve (Figure 4). The AUC of the ROC curve in Figure 4 (black line) is 0.79 (95% confidence interval [CI], 0.72–0.85). None of the possible interaction terms between independent predictors was significant or increased the AUC of the ROC curve. The optimal cut point for predicting large cluster episodes was when sensitivity was 65% and specificity was 82%. The corresponding positive and negative predictive values were 25 and 96%, respectively. The probability corresponding with this optimal cut point is 14%, indicating that clusters with a predicted probability above 14% are likely to become large. In Figure 5, the probability of a large cluster episode is given for all possible combinations of characteristics of the first two cases. When the first two cases in a cluster episode occur within 3 months' time and at least one of them is younger than 35 year, according to their origin and address, the risk of development of a large cluster is one to more than five times increased.


Cluster Episodes of 2 Years Large Clusters (n = 54) Compared with Small Clusters (n = 568)

Cluster Episodes of 3 Years Large Clusters (n = 84) Compared with Small Clusters (n = 538)
Characteristics of the First Two Cases in a Cluster Episode
Odds Ratio for Large Clusters (95% CI)
P Value
Odds Ratio for Large Clusters (95% CI)
P Value
Age below 35 yr0.0010.040
 1 or both cases are below 35 yr4.50 (1.54–13.17)1.93 (1.00–3.73)
Urban residence0.0120.015
 1 case lives in urban1.30 (0.62–2.72)1.03 (0.57–1.88)
 Both cases live in urban3.00 (1.46–6.18)2.29 (1.29–4.09)
Sub-Saharan African nationality0.0640.247
 None or unknown11
 1 case is from sub-Saharan Africa1.13 (0.45–2.89)1.26 (0.59–2.69)
 Both cases are from sub-Saharan Africa2.36 (1.16–4.82)1.66 (0.92–3.01)
Time between diagnosis 1st and 2nd cases, mo<0.001<0.001
 0 to <36.62 (3.54–12.39)3.48 (2.14–5.66)


Definition of abbreviation: CI = confidence interval.

*n = 622 cluster episodes.

To determine whether the ability to predict large clusters increased when characteristics of the first three instead of the first two cases were used, we repeated the analysis by comparing large clusters with small clusters with at least three cases (data not shown). Except for age, which was no longer a significant (P = 0.19) predictor, no other independent predictors were found. The AUC of the ROC curve did not change to a relevant extent (from 0.79 to 0.81; 95% CI, 0.74–0.88).

The size of a large cluster episode was arbitrarily set at five or more cases. Our model was still valid when large cluster episodes were defined as having four or more, or six or more cases (data not shown). However, the AUC of the ROC curve changed to 0.70 (95% CI, 0.64–0.76) and 0.87 (95% CI, 0.80–0.93), respectively (Figure 4).

We evaluated the usefulness of our model for the prediction of large cluster episodes that occurred during a 3-, instead of 2-year period. As a consequence, 30 of our small cluster episodes became large and the AUC of the ROC curve decreased to 0.70 (95% CI, 0.63–0.76) when the same predicting characteristics were included. Origin from sub-Saharan Africa seemed still a risk factor but was not significant anymore (Table 3).

This study showed that the growth of new TB clusters with five or more cases within 2 years can be predicted by the characteristics of the first two cases. Independent predictors for large cluster episodes were age under 35 years, living in an urban area, sub-Saharan African nationality of at least one of the first two patients, and less than 3 months' time between diagnosis of these first two patients. Sensitivity analysis showed that the discriminative ability of our model remained good when the definition of a large cluster episode was changed to include at least four or six cases, or when the time span of a large cluster covered 3 years instead of 2.

Time between cases, age, nationality, and residence are all variables that are known shortly after the diagnosis of a new TB case and should be part of the national TB registration. When molecular data and the national registration are combined, new cluster episodes can be screened using these risk factors to identify those clusters at a higher risk of increasing in size, thereby providing an early warning system for municipal health services. In the United States, 38 to 57% of the clustered cases were found in addition to conventional contact investigation when information from genotyping M. tuberculosis isolates was used (21). Nowadays, a considerably faster method than IS6110-RFLP typing is available which is based on variable numbers of tandem repeats (22). This technique enables to give feedback on clustering of TB cases within a few weeks.

A short time span between the first two patients in a cluster was the strongest predictor for large cluster episodes. One could attribute this strong association partly to our definition of a large cluster episode, which required more cases within 2 years than small cluster episodes. However, that does not explain our observations that only a period of less than 3 months between the first and second case was predictive, the same risk pattern was found when a cluster episode covered 3 instead of 2 years, and no decreasing trend was observed toward longer periods.

Patient delay of first cases of large clusters was substantially longer (mean delay was 4.8 wk longer) than that of small clusters, although this difference was not significant. The prolonged patient delay of first cases of large clusters could explain the very short time span found between the first and second cases of large cluster episodes. Index cases that experienced a longer period with complaints and delayed health seeking may have infected more secondary cases and as a consequence gave rise to larger clusters than those index cases with a shorter patient delay.

Population-based studies previously showed that young age is a risk factor for clustering (8, 9, 2325). Usually, young index cases have more intimate contacts and more contacts in general (26), and as a result, generate more secondary cases (6, 9) than older index cases. This agrees with our finding that young age is a predictor of cluster growth. Also, in urban areas, the number of possible contacts and thus the chance that a patient with infectious TB will infect another person is greater than in rural areas (2, 25, 27).

We showed that patients from sub-Saharan Africa more often gave rise to a large cluster episode. Different nationalities have been associated with clustering, depending on the study setting (2, 79, 2831). Most studies showed that clustering tends to occur among persons with the same nationality (28, 29, 31). African patients, especially when coming from Morocco to the Netherlands (9) or from Somalia to Denmark (28), showed high risks of being the first case in a cluster. However, an epidemiologic link is rarely detected among African immigrants who share a DNA fingerprint (32, 33), which suggests that African cases may have contracted their infection in their home country where this DNA fingerprint is common (34).

Underlying HIV infection, a well-known risk factor for progression to active TB disease and associated with a shorter time between successive cases in clusters with at least one HIV-positive person (35), is uncommon among patients with TB in the Netherlands (estimated prevalence is 4.1%) (36). We were unable to show that HIV infection was an independent predictor for the development of large cluster episodes.

The aim of our study was to find a method to predict the majority of large clusters, without classifying too many small clusters incorrectly as large. The optimal cutoff point of our model allowed us to correctly predict 65% of the large cluster episodes, with only 18% of all small cluster episodes incorrectly predicted as becoming large. Because small cluster episodes occurred more frequently than large ones, the positive predictive value was 25% in this population. If all new cluster episodes with a probability above 14% were considered as potential outbreaks, this would lead to intensified case finding in 172 of 622 cluster episodes during our 8-year observation period; approximately 22 per year. In comparison with all patients with pulmonary TB in the Netherlands (672 in 2005) this is a rather small number. In addition, when intensified case finding succeeds and further transmission is prevented, the number of large clusters will gradually decrease over time. Although large cluster episodes occur rarely, the number of cases involved in large clusters can be substantial (4). Therefore, contact investigations around cases in potentially fast-growing clusters may need more attention than the routine investigation that is done in the Netherlands around all culture-confirmed TB cases.

To our knowledge, few studies reported risk factors for cluster growth (10, 37). Driver and colleagues (37) showed that infectiousness of the initial cases was associated with a higher rate of cluster growth compared with clusters in which neither case was infectious. We were unable to confirm this finding because sputum smear results were missing in 29% of our cases and only available since 1996. Even when we considered all missing values as positive sputum smear results, sputum smear positivity of the first two patients was not an independent predictor.

Through the selection of a 2-year time period to define cluster episodes, we may not have included all epidemiologically linked cases (38, 39). A recent study in the Netherlands showed that over half of the secondary cases caused by new strains (strains that were not isolated within the preceding 2 yr) occur within 2 years after introduction (40). Another study showed that 86% of cases that clustered within 2 years had an epidemiologic link that was evident or likely (33).

We assumed that cases who develop active TB would do so at least within 2 years after infection; otherwise, they were not considered as a secondary case, but as a new source case. By this definition, we were able to include more than one cluster episode of a particular fingerprint. Our results therefore represent predictors for all possible outbreaks of emerging and reemerging strains rather than only new fingerprints. One limitation is that our model may not be valid for existing clusters that continue to have cases at least every 2 years.

We found that clustered cases with an infection caused by an M. tuberculosis strain of the Beijing genotype, with rifampicin resistance or from Asia, were relatively more often part of a cluster episode of 2 years than other clustered cases. The association between rifampicin resistance and clustering can be explained by the fact that patients infected with a resistant strain remain infectious longer, because the resistance is usually not recognized directly at diagnosis. The fact that strains of the Beijing genotype were more often part of short cluster episodes is highly interesting, because this suggests that such strains transmit more successfully or that patients infected with such strains more rapidly progress to TB disease.

Another limitation of our study is that we could only include culture-positive and matched TB cases and therefore excluded all possible transmission to and from patients who were not confirmed by culture, reducing potential cluster size. Furthermore, misclassification could have occurred due to instability of the fingerprint pattern (41), which would cause us to miss clustered cases. Because the number of large clusters is small, the power of our study was limited to find or exclude risk factors with small relative risks.

In conclusion, we showed that the majority of TB outbreaks can be predicted by characteristics of the first two cases in a cluster episode. It is unclear whether the same predictive factors apply in other settings. Even in other low-endemic countries, the population and transmission patterns can differ from those in the Netherlands. However, the methodology we used can be applied by others to identify set specific predictive factors. TB cases who are part of new cluster episodes should be screened for the risk factors described in this study, and targeted interventions (e.g., intensified contact investigation) should be considered to prevent the predicted development of large clusters.

The authors are grateful to all Municipal Health Services in the Netherlands for regularly contributing data to the NTR for over 14 years. The authors also thank Nico Kalisvaart for his assistance in combining the two datasets and Saskia den Boon and Rein Houben for critical review of an earlier version of the manuscript.

1. Mazurek GH, Cave MD, Eisenach KD, Wallace RJ, Bates JH, Crawford JT. Chromosomal DNA fingerprint patterns produced with IS6110 as strain-specific markers for epidemiologic study of tuberculosis. J Clin Microbiol 1991;29:2030–2033.
2. Van Soolingen D, Borgdorff MW, De Haas PEW, Sebek MMGG, Veen J, Dessens M, Kremer K, van Embden JDA. Molecular epidemiology of tuberculosis in the Netherlands: a nationwide study from 1993 through 1997. J Infect Dis 1999;180:726–736.
3. Vynnycky E, Nagelkerke N, Borgdorff MW, van Soolingen D, van Embden JDA, Fine PEM. The effect of age and study duration on the relationship between “clustering” of DNA fingerprint patterns and the proportion of tuberculosis disease attributable to recent transmission. Epidemiol Infect 2001;126:43–62.
4. Kuyvenhoven JV, Cobelens FG. Large-scale outbreak investigation for tuberculosis in Zeist [in Dutch]. Ned Tijdschr Geneeskd 2005;149:1925–1928.
5. Small PM, Hopewell PC, Singh SP, Paz A, Parsonnet J, Ruston DC, Schecter GF, Daley CL, Schoolnik GK. The epidemiology of tuberculosis in San Fransisco: a population-based study using conventional and molecular methods. N Engl J Med 1994;330:1703–1709.
6. Borgdorff MW, Nagelkerke NJ, de Haas PE, van Soolingen D. Transmission of Mycobacterium tuberculosis depending on the age and sex of source cases. Am J Epidemiol 2001;154:934–943.
7. Iòigo J, Garcéa de Viedma D, Arce A, Palenque E, Alonso Rodríguez N, Rodríguez E, Ruiz Serrano MJ, Andrés S, Bouza E, Chaves F. Analysis of changes in recent tuberculosis transmission patterns after a sharp increase in immigration. J Clin Microbiol 2007;45:63–69.
8. Heldal E, Dahle UR, Sandven P, Caugant DA, Brattaas N, Waaler HT, Enarson DA, Tverdal A, Kongerud J. Risk factors for recent transmission of Mycobacterium tuberculosis. Eur Respir J 2003;22:637–642.
9. Verver S, van Soolingen D, Borgdorff MW. Effect of screening of immigrants on tuberculosis transmission. Int J Tuberc Lung Dis 2002;6:121–129.
10. Rodrigo T, Caylà JA, García de Olalla P, Galdós-Tangüis H, Jansà JM, Miranda P, Brugal T. Characteristics of tuberculosis patients who generate secondary cases. Int J Tuberc Lung Dis 1997;1:352–357.
11. Kik SV, Verver S, Kremer K, de Haas PEW, van der Sande M, Cobelens FGJ, van Soolingen D, Borgdorff MW. Risk factors for tuberculosis outbreaks in the Netherlands [abstract]. Eur J Epidemiol 2006;21:S64.
12. Kik SV, Verver S, Kremer K, de Haas P, Cobelens F, van Soolingen D, Borgdorff M. Risk factors for tuberculosis outbreaks in the Netherlands [abstract]. Int J Tuberc Lung Dis 2007;10:S102.
13. Van Embden JDA, Cave MD, Crawford JT, Dale JW, Eisenach KD, Gicquel B, Hermans P, Martin C, McAdam R, Shinnick TM, et al. Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. J Clin Microbiol 1993;31:406–409.
14. van Soolingen D, de Haas PEW, Hermans PWM, Groenen PMA, van Embden JDA. Comparison of various repetitive DNA elements as genetic markers for strain differentiation and epidemiology of Mycobacterium tuberculosis. J Clin Microbiol 1993;31:1987–1995.
15. Kamerbeek J, Schouls L, Kolk A, van Agterveld M, van Soolingen D, Kuijper S, Bunschoten A, Molhuizen H, Shaw R, Goyal M, et al. Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin Microbiol 1997;35:907–914.
16. Kremer K, Glynn JR, Lilebaek T, Niemann S, Kurepina NE, Kreiswirth BN, Bifani PJ, van Soolingen D. Definition of Beijing/W lineage of Mycobacterium tuberculosis on the basis of genetic markers. J Clin Microbiol 2004;42:4040–4049.
17. Kremer K, van Soolingen D, Frothingham R, Haas WH, Hermans PWM, Martín C, Palittapongarnpim P, Plikaytis BB, Riley LW, Yakrus MA, et al. Comparison of methods based on different molecular epidemiological markers for typing of Mycobacterium tuberculosis complex strains: interlaboratory study of discriminatory power and reproducibility. J Clin Microbiol 1999;37:2607–2618.
18. Rad ME, Bifani P, Martin C, Kremer K, Samper S, Rauzier J, Kreiswirth B, Blazquez J, Jouan M, van Soolingen D, et al. Mutations in putative mutator genes of Mycobacterium tuberculosis strains of the W-Beijing family. Emerg Infect Dis 2003;9:838–845.
19. Ferebee SH. Controlled chemoprophylaxis trials in tuberculosis: a general review. Bibl Tuberc 1970;26:28–106.
20. Vos AM, Meima A, Verver S, Looman CW, Bos V, Borgdorff MW, Habbema JD. High incidence of pulmonary tuberculosis persists a decade after immigration, The Netherlands. Emerg Infect Dis 2004;10:736–739.
21. McNabb SJN, Kammerer JS, Hickey AC, Braden CR, Shang N, Rosenblum LS, Navin TR. Added epidemiologic value to tuberculosis prevention and control of the investigation of clustered genotypes of Mycobacterium tuberculosis isolates. Am J Epidemiol 2004;160:589–597.
22. Supply P, Allix C, Lesjean S, Cardoso-Oelemann M, Rüsch-Gerdes S, Willery E, Savine E, de Haas P, van Deutekom H, Roring S, et al. Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis. J Clin Microbiol 2006;44:4498–4510.
23. Blackwood KS, Al-Azem A, Elliott LJ, Hershfield ES, Kabani AM. Conventional and molecular epidemiology of tuberculosis in Manitoba. BMC Infect Dis 2003;13:18.
24. Cacho Calvo J, Astray Mochales J, Perez Meixeira A, Ramos Martos A, Hernando García M, Sánchez Conchiero M, Domín Pérez JR, Gómez AB, Samper S, Martín C. Ten-year population-based molecular epidemiological study of tuberculosis transmission in the metropolitan area of Madrid, Spain. Int J Tuberc Lung Dis 2005;9:1236–1241.
25. Ruiz M, Navarro JF, Rodríguez JC, Larrossa JA, Royo G. Effect of clinical and socio-economic factors on the rate of clustering of Mycobacterium tuberculosis clinical isolates in Elche (Spain). Epidemiol Infect 2003;131:1077–1083.
26. van Geuns HA, Meijer J, Styblo K. Results of contact examination in Rotterdam, 1967–1969. Bull Int Union Tuberc 1975;50:107–121.
27. Rieder HL. Epidemiologic basis of tuberculosis control. Paris, France: International Union Against Tuberculosis and Lung Disease; 1999.
28. Lillebaek T, Andersen AB, Bauer J, Dirksen A, Glismann S, de Haas P, Kok-Jensen A. Risk of Mycobacterium tuberculosis transmission in a low-incidence country due to immigration from high-incidence areas. J Clin Microbiol 2001;39:855–861.
29. Lari N, Rindi L, Bonanni D, Rastogi N, Sola C, Tortoli E, Garzelli C. A three-year longitudinal study of the genotypes of Mycobacterium tuberculosis in Tuscany, Italy. J Clin Microbiol 2007;45:1851–1857.
30. Diel R, Rusch-Gerdes S, Niemann S. Molecular epidemiology of tuberculosis among immigrants in Hamburg, Germany. J Clin Microbiol 2004;42:2952–2960.
31. Borgdorff MW, Nagelkerke N, van Soolingen D, de Haas PEW, Veen J, van Embden JDA. Analysis of tuberculosis transmission between nationalities in the Netherlands in the period 1993–1995 using DNA fingerprinting. Am J Epidemiol 1998;147:187–195.
32. Chin DP, DeRiemer K, Small PM, de Leon AP, Steinhart R, Schecter GF, Daley CL, Moss AR, Paz EA, Jasmer RM, et al. Differences in contributing factors to tuberculosis incidence in US-born and foreign-born persons. Am J Respir Crit Care Med 1998;158:1797–1803.
33. van Deutekom H, Hoijng SP, de Haas PEW, Langendam MW, Horsman A, van Soolingen D, Coutinho RA. Clustered tuberculosis cases: do they represent recent transmission and can they be detected earlier? Am J Respir Crit Care Med 2004;169:806–810.
34. Glynn JR, Whiteley J, Bifani PJ, Kremer K, van Soolingen D. Worldwide occurrence of Beijing/W strains of Mycobacterium tuberculosis: a systematic review. Emerg Infect Dis 2002;8:843–849.
35. DeRiemer K, Kawamura LM, Hopewell PC, Daley CL. Quantitative impact of human immunodeficiency virus infection on tuberculosis dynamics. Am J Respir Crit Care Med 2007;176:936–944.
36. Haar CH, Cobelens FGJ, Kalisvaart NA, van der Have JJ, van Gerven PJHJ, van Deutekom H. HIV prevalence among tuberculosis patients in the Netherlands, 1993–2001: trends and risk factors. Int J Tuberc Lung Dis 2006;10:768–774.
37. Driver CR, Macaraig M, McElroy PD, Clark C, Munsiff SS, Kreiswirth B, Driscoll J, Zhao B. Which patients' factors predict the rate of growth of Mycobacterium tuberculosis clusters in an urban community? Am J Epidemiol 2006;164:21–31.
38. Glynn JR, Vynnycky E, Fine PE. Influence of sampling on estimates of clustering and recent transmission of Mycobacterium tuberculosis derived from DNA fingerprinting techniques. Am J Epidemiol 1999;149:366–371.
39. Murray M, Alland D. Methodological problems in the molecular epidemiology of tuberculosis. Am J Epidemiol 2002;155:565–571.
40. Borgdorff MW, van der Werf MJ, de Haas PE, Kremer K, van Soolingen D. Tuberculosis elimination in the Netherlands. Emerg Infect Dis 2005;11:597–602.
41. de Boer AS, Borgdorff MW, de Haas PE, Nagelkerke NJD, van Embden JDA, van Soolingen D. Analysis of rate of change of IS6110 RFLP patterns of Mycobacterium tuberculosis based on serial patient isolates. J Infect Dis 1999;180:1238–1244.
Correspondence and requests for reprints should be addressed to Sandra V. Kik, M.Sc., KNCV Tuberculosis Foundation, P.O. Box 146, 2501 CC The Hague, The Netherlands. E-mail:


No related items
American Journal of Respiratory and Critical Care Medicine

Click to see any corrections or updates and to confirm this is the authentic version of record