American Journal of Respiratory and Critical Care Medicine

Recently a series of genome-wide association study manuscripts in asthma and chronic obstructive pulmonary disease have been published. These papers suggest that, in part, asthma and chronic obstructive pulmonary disease have a common genetic origin, and that this common origin is due to polymorphism in genes that are involved with the development of the lung. This Pulmonary Perspective discusses what we are learning from genome-wide association studies, where the field of genetics and genomics is headed, and how this knowledge will ultimately be put to use in clinical medicine.

“It is not the beginning of the end, but it might be the end of the beginning.”—Winston Churchill

Ten years since the completion of the draft sequence of the human genome, we are reaching the end of the beginning of the genomic revolution. Genetic epidemiology has progressed from candidate gene studies, to fine mapping of linked regions, to whole genome association studies. Integrative genomics is on the immediate horizon and complete genome sequencing will soon follow. This Pulmonary Perspective describes what we have learned from genetic association studies to date and how it is informing what we think about the origins of asthma and chronic obstructive pulmonary disease (COPD), and how genomics may begin to have a clinical impact on the care of our patients with these disorders. We focus on recent papers in Nature Genetics (13) and the New England Journal of Medicine (46), work published in our own American Journal of Respiratory and Critical Care Medicine (7), and some other recent genome-wide association studies (GWAS) in asthma (8) and COPD (9, 10) in other journals.

Let us start with what we know best, namely the environment. Since the late 1960s to early 1970s we have known that although cigarette smoking is a risk factor for both COPD and asthma, only 10 to 15% of cigarette smokers go on to get COPD, suggesting that genetic susceptibility plays an important role in disease risk. Before the genomic era, susceptibility was defined clinically. The Dutch hypothesis, first publically articulated by Orie and colleagues in 1961, suggested that the susceptible smoker was likely to be someone who had asthma or allergies in early life and then took up smoking (11). Interest in this hypothesis has waxed and waned over the past 40 years, increasing with the application of large-scale airway responsiveness testing in populations, which demonstrated that increased airway responsiveness in early adult life could predict the subsequent development of COPD (12) and that smokers with increased airway responsiveness had accelerated decline in lung function (13). Interest decreased when we could take this no further clinically or therapeutically. Until recently, we have had no way of identifying the underlying genes involved, nor were we able to determine the potential overlap between genes determining normal lung function and those for asthma and COPD. We are beginning to close in on these issues now.

Candidate gene studies still have an important role to play, as evidenced by the recent paper on MMP12 (4). More than 20 years ago Hautamaki and colleagues showed that the MMP 12 knock-out mouse was unusually susceptible to emphysema when it was exposed to cigarette smoke, giving new life to the protease–antiprotease hypothesis as important in COPD pathogenesis (14). Hunninghake and colleagues (4) tested a functional variant in this gene in multiple child and adult populations showing that the variant was associated with both lung function growth and decline in more than 8,000 subjects from multiple cohorts. This human genetic work directly links the Hautamaki and coworkers (14) mouse experiment to human populations and diseases, thus providing human data linking proteolytic damage to asthma as well as COPD. Although this work does not suggest novel pathobiology, it extends what we know about proteolytic damage to asthma and directly links it to the susceptibility theory first proposed by Orie and colleagues (11), providing clinicians with deeper insight into disease natural history and suggesting, at a genetic level, a direct link between asthma and COPD. This link, as noted by the editorialist in the New England Journal of Medicine, took us years to make for the first positionally cloned gene for asthma, Adam 33 (6). It also shows that if we have the right gene we can go from the mouse to human populations and back again, we hope over a shorter time frame than the 20 years it took in the case of MMP12. Finally, it provides an entry into the emerging world of systems biology/systems genetics that can now be used to develop the gene networks around MMP12 we hope will aid us in finding novel therapeutic targets that will ultimately assist clinicians in desperate need of new therapeutic approaches to these airway disorders. Another important candidate gene is Serpin E2, as this replicated candidate gene for COPD was the first linking lung development to airway disease in later life (15, 16).

For the uninitiated, GWAS studies relate a large number of single nucleotide polymorphisms (SNPs) (500,000 to 2 million), picked to give maximum information about the genome, to a phenotype of interest, such as FEV1, asthma, or COPD (usually defined operationally as post-bronchodilator FEV1 < 65% of predicted) in a genetic association study. Two large metaanalytic GWAS studies of predominantly normal lung function recently published in Nature Genetics suggest several novel genes related to FEV1 and FEV1/FVC ratio (1, 2). The number one locus identified in these two papers was near HHIP, a gene previously identified in two GWAS papers as being associated with COPD (9, 10) The six other novel loci that replicated in both these large studies—6p21 (AGER and PPT2), 2q35 (near TSN1), 6q24.1 (GPR126), 15q33 (THSD4), 5q32–33 (HTR4), and 4q24 (four genes including GSTCD)—have not yet been tested in asthma or COPD populations, but this clearly needs to be done. Although you can read my commentary in Nature Genetics for more details on the methodology of the studies involved (3), I would like to focus here on HHIP.

HHIP is interesting because it is critical for general organ development in utero, including lung development (17). Let us go back to phenotype for a minute. Lung function, as determined from spirometry, can first be reliably measured around the age of 5 years. From this time forward, FEV1 has a very high within-person tracking correlation, meaning that once you are set on a certain growth curve at birth you tend to stay on that curve unless moved off it by a significant environmental event (smoking or a severe infection, either viral or bacterial). This can be seen clearly in Figure 1, showing data from the Six-Cities Study, wherein several male children are plotted as they grow (18). The clear implication of this is that polymorphism in genes that determine development of the lung may be critical for determining susceptibility for asthma and COPD. One would predict an overlapping Venn diagram where some genes are important for normal lung development and function, some for asthma, and some for COPD. Some genes will be critical for all three phenotypes and some for only one or two.

The recent paper in the American Journal of Respiratory and Critical Care Medicine by Koh and colleagues used human fetal lung samples from the first two stages of lung development to define a gene set critical for the early development of the lung (7). One interesting finding is that the molecular stages of lung development are not entirely synchronous with the pathologic stages. Another finding from this paper is that a total of 3,223 genes were identified as being involved in a subtranscriptomic network for the early development of the human lung. Surfactants B and C were included in this group, suggesting a role for surfactants in branching morphogenesis as well as alveolarization, demonstrating the pleiotropy or multifunctionality of genes. Building a gene network to describe how these genes assemble a human lung and then describing how polymorphisms in these lung development genes relate to normal lung function, asthma, and COPD, is now an achievable scientific agenda.

GWAS studies have identified novel genes, such as ORMDL3, HHIP, or DENND1B, or genes that had previously been believed to be likely asthma or COPD genes, such as PDE4D. They have been valuable because they have clearly suggested that there is a link between lung development and both asthma and COPD. If not a novel theory, it is one that broadly unifies airway disease as a continuum from development to old age. The new genes identified have also given us a glimpse of novel biology, albeit not in a unified way. Perhaps this is because identifying genes one at a time is not enough. We need to move faster and we need more sophisticated data analytic approaches. New approaches to GWAS data analysis described below should help us to get to clinical relevance while we are waiting for whole genome sequencing.

Despite these clear successes, the clinical impact of these findings has been nil. How long will it take to have a clinical impact and how is this likely to occur? Many believe that the most immediate impact on clinical practice will be in the area of prediction. We initially believed this would be easy. Identify genes from GWAS, put them into multivariate models either linear, for continuous quantitative trait loci (QTLs) like FEV1, or logistic, for categorical phenotypes like asthma or COPD, and thus explain clinical events. Only this approach does not seem to work. Evidence that it does not work is given in the Repapi and colleagues paper in Nature Genetics (1). These investigators found that they could explain only 0.54% of the variation in FEV1 with the SNPs in the five genes that they could replicate in their GWAS. This phenomenon has been repeated for other human QTLs, such as height, that are known to be as highly heritable as lung function. Twenty SNPs replicated from GWAS for height could explain only 3.4% of the variability in that phenotype (19). This problem, labeled “the case of the missing variability,” has been the subject of much controversy recently, with some geneticists arguing that GWAS is a failure because of this lack of prediction and should be abandoned in favor of complete sequencing of the genome and identification of rare variants (20, 21), whereas others have argued that GWAS has been valuable in identifying new genes and pathways (22, 23). Although identifying rare variants and complete sequencing of the genome is a laudable goal worth pursuing, it will not solve the case of the missing variability; it will only exacerbate it, by greatly expanding the amount of data without tackling the fundamental problem of how genes interact with each other, which is called epistasis.

The central question here is how to model the epistasis in the genome. Genes operate in complex networks, not as individual actors in a molecular physiologic play. To focus on individual genes at the expense of networks is an attempt to avoid what is a much more complicated problem. I am not alone in this belief (24). A recent National Heart, Lung and Blood Institute (NHLBI) workshop came to similar conclusion. Until we recognize this, and begin to approach it analytically not at the level of the isolated individual gene but at the level of gene networks, we will fail, both in our immediate clinically relevant goal of prediction and in our long-term goal of understanding the pathobiology leading to novel treatments and prevention.

There are two potential approaches to the problem of the missing variability currently being pursued by investigators that could yield important results and assist investigators in solving this problem. One approach is integrative genomics, using gene expression as a phenotype and examining how DNA polymorphism contributes to the gene expression signal. This expression QTL (eQTL) approach maximally leverages the central dogma of molecular biology by using gene expression as the phenotype of interest. Relating genetic polymorphism to expression level is more powerful than GWAS alone and, when integrated with GWAS, allows for the opportunity to build gene networks that might have novel biologic relevance to disease, but more specifically allows us to model epistatic gene–gene interactions in a biologically meaningful way. Obviously it begs the question of what tissue is the most relevant for this exercise, but there are a variety of accessible cells in peripheral blood, notably the CD4+ T lymphocyte and the macrophage, that can be used. The Biorepository initiative of the EVE network of asthma investigators holds great potential for doing this for asthma, wherein RNA samples are being collected from both peripheral blood and lung tissue in a subset of all NHLBI-funded GWAS studies. This resource will be made available to the investigative community through the NHLBI's Biolink Biorepository.

The second approach is a bit more computationally complex but also worth pursuing. This would involve picking SNPs directly from GWAS not based on P value but based on their potential to interact with other SNPs and the phenotype of interest. The problem with this latter approach is twofold: What is the metric that should be used to pick the SNP? And once you have this metric, how do you computationally address getting the interactions of interest? With regard to the first problem, one way not to pick the SNP is on the basis of a low P value. P values reflect how likely the SNP of interest in the GWAS is related to the phenotype. It can be interpreted in four ways: either a very low P value is causally related to the phenotype of interest, or it is due to chance, bias (in the case of genetic studies usually population stratification), or linkage disequilibrium. Whatever the interpretation, it is only a reflection of how that SNP relates to the phenotype, not how it relates to other SNPs, which is the central issue in epistatic interaction. Because almost all genes in complex traits have very low effect estimates (odds ratios of 0.6–1.5), what tends to drive genetic association is the sample size and the allele frequency, not the effect size. Choosing SNPs to investigate for epistasis based on P value alone has been proven not to work in numerous examples, such as height, and in the FEV1 GWAS already cited (1, 20). What we are interested in here is not only how the SNP relates both to the phenotype of interest but, as importantly, to other SNPs. It is likely that the application of machine-learning approaches to data mining, if applied to GWAS, will lead to epistatic models that will have reasonable predictive power. This approach has been used in the human gene expression world and with animal and plant QTLs with much success and is just now penetrating the world of GWAS (25, 26).

Sequencing the human genome, while a tremendous feat, will not solve the problem of the missing variability; it will only increase the P > N problem that plagues all genetic-association and gene-expression studies now. This is basically a problem of too many predictors (the P), for the sample size (the N). Without some way of narrowing the search space we are doomed to failure. Machine-learning data-mining techniques and integrative genomics can do this, and should lead us to the first set of real applications of genomics to clinical medicine, which will be in the area of prediction of clinical events.

Just as with Churchill, when asked if the defeat of Rommel in North Africa signaled the beginning of the end of the Second World War, our fight to make the genomic revolution relevant to the care of our patients is not over, but things are about to get a lot more interesting.

1. Repapi E, Sayers I, Wain LV, Burton PR, Johnson T, Obeidat M, Zhao JH, Ramasamy A, Zhai G, Vitart V, et al. Genome wide association study identifies five new loci associated with lung function. Nat Genet 2010;42:36–44.
2. Hancock DB, Eijgelsheim M, Wilk JB, Gharib SA, Loehr LR, Marciante KD, Franceschini N, van Durme Y, Chen T, Barr RG, et al. Meta analyses of genome wide association studies identify multiple novel loci related to pulmonary function: the charge consortium. Nat Genet 2010;42:45–52.
3. Weiss ST. Lung function and airway diseases. Nat Genet 2010;42:14–16.
4. Hunninghake GM, Cho MH, Tesfaigzi Y, Soto-Quiros ME, Avila L, Lasky-Su J, Stidley C, Melen E, Soderhall C, Hallberg J, et al. MMP12, lung function, and COPD in high-risk populations. N Engl J Med 2009;361:2599–2608.
5. Sleiman PM, Flory J, Imielinski M, Bradfield JP, Annaiah K, Willis-Owen SAG, Wang K, Rafaels NM, Michel S, Bonnelykke K, et al. Variants of DENND1B associated with asthma in children. N Eng J Med 2010;362:36–44.
6. Brussell GG. Matrix metalloproteinase 12, asthma, and COPD. N Engl J Med 2009;361:2664–2665.
7. Kho AT, Bhattacharya S, Tantisira KG, Carey VJ, Gaedigk R, Leeder JS, Kohane IS, Weiss ST, Mariani TJ. Transcriptomic analysis of human lung development. Am J Respir Crit Care Med 2010;181:54–63.
8. Himes BE, Hunninghake GM, Baurley JW, Rafaels NM, Sleiman P, Strachan DP, Wilk JB, Willis-Owen SA, Klanderman B, Lasky-Su J, et al. Genome-wide association analysis identifies PDE4D as an asthma-susceptibility gene. Am J Hum Genet 2009;84:481–493.
9. Wilk, JB, Chen, TH, Gottlieb, DJ, Walter, RE, Nagle, MW, Brandler, BJ, Myers, RH, Borecki IB, Silverman EK, Weiss ST, et al. A genome-wide association study of pulmonary function measures in the Framingham Heart Study. PLoS Genet 2009;5:e1000429.
10. Pillai SG, Ge D, Zhu G, Kong X, Shianna KV, Need AC, Feng S, Hersh CP, Bakke P, Gulsvik A, et al. A genome-wide association study in chronic obstructive pulmonary disease (COPD): identification of two major susceptibility loci. PLoS Genet 2009;5:e1000421.
11. Orie NGM, Sluiter HJ, De Vries K, Tammeling GJ, Witkop J. The host factor in bronchitis. In: Orie NGM, Sluiter HJ, editors. Bronchitis: an international symposium. University of Groningen, The Netherlands: Royal VanGorcum Ltd Publishers; 1961, pp. 285–286.
12. Hospers JJ, Postma DS, Rijcken B, Weiss ST, Schouten JP. Histamine airway hyper-responsiveness and mortality from chronic obstructive pulmonary disease: a cohort study. Lancet 2000;356:1313–1317.
13. Tashkin DP, Altose MD, Connett JE, Kanner RE, Lee WW, Wise RA. Methacholine reactivity predicts changes in lung function over time in smokers with early chronic obstructive pulmonary disease. The Lung Health Study Research Group. Am J Respir Crit Care Med 1996;153:1802–1811.
14. Hautamaki RD, Kobayashi DK, Senior RM, Shapiro SD. Requirement for macrophage elastase for cigarette smoke-induced emphysema in mice. Science 1997;277:2002–2004.
15. DeMeo DL, Mariani TJ, Lange C, Srisuma S, Litonjua AA, Celedon JC, Lake SL, Reilly JJ, Chapman HA, Mecham BH, et al. The SERPINE2 gene is associated with chronic obstructive pulmonary disease. Am J Hum Genet 2006;79:253–264.
16. Zhu G, Warren L, Aponte J, Gulsvik A, Bakke P, Anderson WH, Lomas DA, Silverman EK, Pillai SG; International COPD Genetics Network (ICGN) Investigators. The SERPINE2 gene is associated with chronic obstructive pulmonary disease in two large populations. Am J Respir Crit Care Med 2007;176:167–173.
17. Bak M, Hansen C, Bak M, Hansen C, Friis Henriksen K, Tommerup N. The human hedgehog–interacting protein gene: structure and chromosomal mapping to 4q31.21-q31.3. Cytogenet Cell Genet 2001;92:300–303.
18. Wang X, Dockery DW, Wypij D, Fay ME, Ferris BG Jr. Pulmonary function between 6 and 18 years of age. Pediatr Pulmonol 1993;15:75–88.
19. Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, Freathy RM, Perry JR, Stevens S, Hall AS, et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 2008;40:575–583.
20. Maher B. The case of the missing heritability. Nature 2008;456:18–21.
21. Goldstein DB. Common genetic variation and human traits. N Engl J Med 2009;360:1696–1698.
22. Kraft P, Hunter DJ. Genetic risk prediction – are we there yet? N Engl J Med 2009;360:1701–1703.
23. Hirschhorn JN. Genomewide association studies–illuminating biologic pathways. N Engl J Med 2009;360:1699–1701.
24. Loscalzo J. Association studies in an era of too much information: clinical analysis of new biomarker and genetic data. Circulation 2007;116:1866–1870.
25. Lynch M, Walsh B. Genetics and analysis of quantitative traits. Sunderland, MA: Sinauer; 1998.
26. de Koning DJ, Schulmont NF, Elo K, Moisio S, Kinos R, Vikki J, Maki-Tanila A. Mapping of multiple quantitative trait loci by simple regression in half-sib designs. J Anim Sci 2001;79:616–622.
Correspondence and requests for reprints should be addressed to Scott T. Weiss, M.D., M.S., Channing Laboratory, 181 Longwood Ave, Boston, MA 02115. E-mail:

Related

No related items
American Journal of Respiratory and Critical Care Medicine
181
11

Click to see any corrections or updates and to confirm this is the authentic version of record