American Journal of Respiratory Cell and Molecular Biology

In this issue of the Red Journal, Maliarik and colleagues present a case-control association study that examines the relationship between several polymorphic markers within the natural resistance–associated macrophage protein (NRAMP1) gene and sarcoidosis. Sarcoidosis is a chronic granulomatous disorder of unknown cause, which is likely a “complex” disease (1). The principal distinction between simple monogenic diseases and complex genetic diseases is that the latter do not exhibit classical Mendelian patterns of inheritance and are likely influenced by multiple genetic and environmental factors (2). Many other respiratory illnesses, such as asthma and chronic obstructive pulmonary disease, are also complex diseases (3, 4).

Understanding the genetic basis of complex human diseases has been increasingly emphasized as a means of achieving insight into disease pathogenesis, with the ultimate goal of improving preventive strategies, diagnostic tools, and therapies. Genetic approaches to complex disorders thus offer great potential to improve our understanding of their pathophysiology, but they also offer significant challenges. Genetic association studies have been applied in a variety of complex respiratory diseases, and the application of this study design could greatly increase with continued progress in the Human Genome Project and the Single Nucleotide Polymorphism (SNP) Consortium (5). In this Perspective, we will discuss the application of genetic association studies to complex respiratory disorders, review several potential problems with the case-control approach, and provide a strategy for evaluating case-control genetic association studies.

Two types of statistical methods have been widely employed to attempt to identify genetic determinants of complex diseases—linkage analysis and association studies. Linkage analysis methods attempt to identify a region of the genome that is transmitted within families along with the disease phenotype of interest. Linkage analysis has been extremely useful in the identification of genes responsible for diseases with simple Mendelian inheritance, such as cystic fibrosis (6). The application of linkage analysis to complex disorders without obvious Mendelian inheritance has been much less successful because complex diseases are most likely influenced by genetic heterogeneity (multiple genetic causes leading to the same disease), environmental phenocopies (purely environmental forms of the disease), incomplete penetrance (subjects inheriting a disease gene but not developing the disease), genotype-by-environment interactions (nonadditivity of genetic and environmental influences on disease development), and multilocus effects (more than one gene influences disease development) (7). Despite these difficulties, multiple regions of the genome have been recurrently identified by linkage analysis as likely to contain asthma susceptibility genes. For sarcoidosis, linked regions have not yet been identified.

Two general approaches have been used to investigate the molecular genetics of complex diseases: candidate gene approaches and whole genome screens followed by positional cloning attempts. Candidate gene approaches can utilize either linkage analysis of highly polymorphic, short tandem repeat marker loci in the candidate region, or association of (usually bi-allelic) coding or noncoding polymorphisms within candidate genes.

Historically, association analysis of genetic polymorphisms has been mostly performed in a case-control setting with unrelated affected subjects compared with unrelated unaffected subjects. Significant differences in allele frequencies between cases and controls are taken as evidence for involvement of an allele in disease susceptibility. Alternatively, genotype frequencies rather than allele frequencies can be compared in cases and controls.

In practical terms, an observed statistical association between an allele and a phenotypic trait will be the result of one of three situations: (1) the finding could be due to chance or artifact, e.g., confounding or selection bias; (2) the allele is in linkage disequilibrium with an allele at another locus that directly affects the expression of the phenotype; or (3) the allele itself is functional and directly affects the expression of the phenotype.

The biologic principle underlying the association analysis of polymorphisms not directly involved in disease pathogenesis is that of linkage disequilibrium (the second situation listed above). Linkage disequilibrium arises from an increased frequency of particular haplotypes across a population on account of the cosegregation of alleles at closely linked loci. Haplotypes having a greater frequency than would be expected because of random association may arise by population admixture, natural selection, genetic drift, or new mutation combined with population “bottlenecks” (8).

One limitation of linkage analysis is the difficulty of fine mapping the location of a gene influencing a complex disorder. There are not usually enough meioses within 1 to 2 megabases of the disease gene to detect recombination events; moreover, with the effects of phenocopies and genetic heterogeneity in complex diseases, critical recombination events may not be identified with certainty. Association analyses are thus essential for localizing susceptibility loci, and they are intrinsically more powerful than linkage analyses in detecting weak genetic effects (2). However, the characteristics of complex diseases, which are problematic in linkage analysis, including genetic heterogeneity, phenocopies, incomplete penetrance, genotype-by-environment interaction, and multilocus effects, also limit the power of association studies.

Risch and Merikangas have shown that association studies can be a very powerful approach for finding genetic determinants of a complex disorder (9). They suggested that if hundreds of thousands of single nucleotide polymorphisms (SNPs) were identified across the genome, then it would be possible to perform genome-wide association studies to identify the regions of linkage disequilibrium around disease susceptibility genes. In addition, they noted that much smaller sample sizes would be required to detect association than to detect linkage. The SNP Consortium is rapidly identifying single nucleotide polymorphisms, and within the next several years, genome-wide association studies may become a reality.

The choice of phenotype is critical to the success of gene discovery programs. Most case-control association studies employ a dichotomous variable (affected/unaffected). If the cases include a heterogeneous collection of etiologies for a complex disease, the power to detect a significant association will be reduced. Similarly, if the control group includes subjects who are affected but undiagnosed, power will be reduced. Disease-associated quantitative traits (also termed intermediate phenotypes) have often been used as proxy disease phenotypes in molecular genetic investigations of complex diseases such as asthma (10). In general, continuous phenotype measurements are inherently more informative, objective, and statistically powerful than binary categorizations of disease status, and they avoid the problems of arbitrary dichotomization. Although intermediate phenotypes have been widely used in genetic studies of asthma (e.g., immunoglobulin E and bronchial hyperresponsiveness), such intermediate phenotypes have not been extensively investigated in genetic studies of sarcoidosis.

What are the causes of a spurious association in a case-control study? As in any statistical test, there could be a false-positive result due to chance; this is less likely if a very stringent threshold for significance (a low P value) is met. But how stringent should this threshold be? In linkage analysis, careful consideration has been given to the appropriate threshold of significance in a genome-wide screen because adjacent markers do not represent independent tests (11). Similar thresholds have not been determined for association studies, and the appropriate correction for multiple statistical comparisons (multiple markers and multiple alleles at each marker) remains unclear. Risch and Merikangas suggested that a Bonferroni correction could be applied if one million SNPs were typed across the genome, with a P value of 5 × 10−8 required to demonstrate a significant association (9). However, the appropriate method of correction for multiple comparisons in candidate gene association studies remains unclear. Bailey-Wilson and colleagues discussed several possible Bonferroni corrections for genetic association studies, including the total number of alleles at all loci, the number of independent alleles at all loci, and the number of loci (12).

The case-control association study design is also susceptible to spurious associations purely related to differences in population stratification between cases and controls (13). Population stratification, also known as population substructure, can result from recent population admixture or differences in ethnicity between cases and controls (14). Although it is unclear whether population stratification is a frequent problem in case-control association studies, population stratification may be a particular problem in genetically diverse populations such as those of the United States. Therefore, family-based association strategies with analytical methods (such as the transmission/disequilibrium [TDT] test) that avoid spurious associations related to population stratification have become increasingly common in genetic studies of complex diseases (13, 15). A limitation of adult-onset diseases, such as sarcoidosis and COPD, is that it is very difficult to obtain data from parents of adult cases, making traditional TDT studies difficult to perform. One possible solution to this problem is to use family-based association methods that use affected and unaffected siblings (e.g., sib-TDT, sibship disequilibrium test) rather than parent–child data (16, 17). A recently developed approach to assess population stratification in a case-control study is to genotype a set of polymorphic markers unlinked to any candidate loci for the disease of interest in cases and controls (18). Absence of association to these unlinked markers provides some assurance that an association with a candidate locus in those cases and controls is not a spurious result.

What criteria can be employed to evaluate a case-control genetic association study? The following general guidelines, summarized in Table 1, may be useful. We will present these guidelines and apply them to the paper by Maliarik and colleagues. First, are the candidate gene(s) under study biologically reasonable? Several factors can determine the appropriateness of a candidate gene. If human genetic linkage studies have identified a chromosomal region linked to a disease, or if an animal model for a disease is influenced by a particular gene or syntenic chromosomal region, positional candidate genes in such genomic regions warrant strong consideration. In addition, the biologic plausibility of a candidate gene for involvement in disease pathogenesis is important. However, obvious limitations of this candidate approach are the large number of potential candidate genes for complex diseases and the reality that only known genes can be investigated. Although candidate genes can be selected for study on this basis, they should not be ruled out on the basis of our current understanding of disease pathophysiology—important new insights may be missed if potential candidate genes must fit into current pathophysiologic models. Although linked regions have not been identified for sarcoidosis, Maliarik and colleagues discuss the biologic rationale for NRAMP1 as a candidate gene for sarcoidosis.

Table 1. Evaluation of candidate gene case-control association studies

IssueKey QuestionsPossible Solutions
Selection of candidateIs candidate gene biologically reasonable?• Demonstration of biologically functional effect
 gene polymorphismIs the candidate gene a positional candidate?• Within linked region in man or syntenic from animal model
Population stratificationAre cases and controls matched?• Matching on ethnicity
• Family-based association designs
• Negative results with multiple unlinked markers
Hardy-Weinberg (H-W)Is control group in H-W equilibrium?• Calculation of H-W equilibrium with goodness-of-fit
 equilibrium test (2 alleles) or simulation (multiple alleles)
Multiple comparisonsHow many alleles were tested?• Bonferroni correction
How many genetic loci were tested?• Estimation of empirical P values

A second criterion in evaluation of case-control association studies is the careful selection of cases and control subjects. Do the case subjects meet appropriate criteria for disease affection? Are control subjects free from symptoms of disease, associated intermediate phenotypes, and potential confounders? Have control subjects been exposed to relevant environmental influences involved in disease pathogenesis while remaining clearly unaffected (19)? Were the cases and controls matched on demographic and environmental factors? Was consideration of population stratification included, either by attempting to match ethnicity or by typing unlinked markers, as suggested by Pritchard and colleagues (18)? Maliarik and colleagues do not explicitly discuss population stratification, but they have matched on racial group. Moreover, they use strict diagnostic criteria for defining affection with sarcoidosis, including tissue biopsy confirmation.

A third criterion in the evaluation of case-control studies is assessment of Hardy-Weinberg equilibrium in the markers studied within the control group. Hardy-Weinberg equilibrium indicates that the genotype frequencies can be determined directly from the allele frequencies; failure to demonstrate Hardy-Weinberg equilibrium could result from genotyping errors, inbreeding, genetic drift, mutation, or population substructure (20). Hardy-Weinberg equilibrium can be readily assessed with a goodness-of-fit chi square test for biallelic markers; for markers with multiple alleles (such as short-tandem repeat markers), more accurate determination of Hardy-Weinberg equilibrium can be obtained with Markov Chain Monte Carlo methods (21). Significant deviations from the expected proportions of homozygote and heterozygote classes in a population of case subjects may be caused by association with the disease allele. Lack of consistency with Hardy-Weinberg equilibrium among control subjects should prompt investigation for potential complications, including genotyping errors and population stratification. Although Hardy-Weinberg calculations are not presented by Maliarik and colleagues, goodness-of-fit tests using the data from their Table 2 (see page 674 of this issue) reveal that the control group is in Hardy-Weinberg equilibrium for the loci presented.

A final criterion for evaluation of a case-control study is correction for multiple comparisons. This remains a problematic topic requiring additional statistical genetic research. However, an effort to correct for spurious associations, which can result from testing a large number of alleles, is warranted. The multiple comparison issue is especially problematic with markers that have multiple alleles like short-tandem repeat polymorphisms; the conservative Bonferroni approach is to use a corrected significance value calculated by multiplication of the observed P value by the number of alleles tested. Bonferroni corrections for the total number of alleles at all loci are probably too conservative (12) because the alleles at one locus are not independent of each other and closely linked loci are probably not independent either. A less conservative but more computationally intensive approach is to estimate empirical significance values using simulation approaches (22). The issue of multiple comparisons is not explicitly addressed by Maliarik and colleagues.

Have Maliarik and colleagues identified an important genetic determinant of sarcoidosis? Maybe. In complex disease genetics, replication is especially critical. Further studies to replicate their findings are required. As the number of case-control association studies in complex disease genetics increases, a central repository for such studies would be quite beneficial to avoid any publication bias in favor of positive results. If a case-control association has been consistently replicated, the most optimistic outcome is that a narrow genomic region will have been identified that contains a disease susceptibility gene. Analysis of adjacent markers in haplotypes can be especially helpful at this stage, and family-based association studies are quite useful for haplotype analysis. Ultimately, geneticists will need to turn the problem of complex disease gene identification back over to the molecular and cell biologists because the identification of the functional disease mutation within a set of adjacent genetic variants in tight linkage disequilibrium depends upon the demonstration of a biologic basis for one or more of the putative genetic variants in the development of disease. The role of a putative disease gene variant can only be confirmed after a biologically functional effect has been demonstrated (23).

The authors would like to thank Dr. Scott Weiss and Dr. Jeffrey Drazen for helpful comments. E.K.S. is supported by an NIH grant (R01 HL61575) and a Research Grant from the American Lung Association. L.J.P. is a National Health and Medical Research Council of Australia (NH and MRC) Research Fellow in Genetic Epidemiology, a Winston Churchill Memorial Trust Churchill Fellow, and an Australian-American Educational Foundation Fulbright Fellow.

1. Rybicki B. A., Maliarik M. J., Major M., Popovich J., Iannuzzi M. C.Epidemiology, demographics, and genetics of sarcoidosis. Semin. Respir. Infect.131998166173
2. Elston R.The genetic dissection of multifactorial traits. Clin. Exp. Allergy21995103106
3. Sandford A., Weir T., Pare P.The genetics of asthma. Am. J. Respir. Crit. Care Med.153199617491765
4. Silverman E. K., Speizer F. E.Risk factors for the development of chronic obstructive pulmonary disease. Med. Clin. North Am.801996501522
5. Masood E.As consortium plans free SNP map of human genome [news]. Nature3981999545546
6. Rommens J. M., Iannuzzi M. C., Kerem B.-S., Drumm M. L., Melmer G., Dean M., Rozmahel R., Cole J. L., Kennedy D., Hidaka N., Zsiga M., Buchwald M., Riordan J. R., Tsui L.-C., Collins F. S.Identification of the cystic fibrosis gene: chromosome walking and jumping. Science245198910591065
7. Lander E. S., Schork N. J.Genetic dissection of complex traits. Science265199420372048
8. Weeks D. E., Lathrop G. M.Polygenic disease: methods for mapping complex disease traits. Trends Genet.111995513519
9. Risch N., Merikangas K.The future of genetic studies of complex human diseases. Science273199615161517
10. Cookson, W., and L. Palmer. 1998. Investigating the asthma phenotype. Clin. Exp. Allergy 28(Suppl. 1):88–89.
11. Lander E., Kruglyak L.Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat. Gen.111995241247
12. Bailey-Wilson J. E., Sorant B., Sorant A. J., Paul C. M., Elston R. C.Model-free association analysis of a rare disease. Genet. Epidemiol.121995571575
13. Spielman R. S., McGinnis R. E., Ewens W. J.Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet.521993506516
14. Ewens W., Spielman R.The transmission/disequilibrium test: history, subdivision, and admixture. Am. J. Hum. Genet.571995455464
15. Schaid D. J.Statistical genetics 98: transmission disequilibrium, family controls, and great expectations. Am. J. Hum. Genet.631998935941
16. Spielman R. S., Ewens W. J.A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am. J. Hum. Genet.621998450458
17. Horvath S., Laird N. M.A discordant-sibship test for disequilibrium and linkage: no need for parental data [see Comments]. Am. J. Hum. Genet.63199818861897
18. Pritchard J. K., Rosenberg N. A.Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet.651999220228
19. Morton N. E., Collins A.Tests and estimates of allelic association in complex inheritance. Proc. Natl. Acad. Sci. USA9519981138911393
20. Gillespie, J. H. 1998. Population Genetics: A Concise Guide. The Johns Hopkins University Press, Baltimore, MD.
21. Guo S. W., Thompson E. A.Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics481992361372
22. Besag J., Clifford P.Sequential Monte Carlo p-values. Biometrika781991301304
23. Fields S.The future is function. Nat. Genet.151997325327
Address correspondence to: Edwin K. Silverman, M.D., Ph.D., Channing Laboratory, Brigham and Women's Hospital, 181 Longwood Avenue, Boston, MA 02115. E-mail:
Abbreviations: single nucleotide polymorphisms, SNPs; transmission/disequilibrium test, TDT.


No related items
American Journal of Respiratory Cell and Molecular Biology

Click to see any corrections or updates and to confirm this is the authentic version of record