Rationale: Pediatric asthma has variable underlying inflammation and symptom control. Approaches to addressing this heterogeneity, such as clustering methods to find phenotypes and predict outcomes, have been investigated. However, clustering based on the relationship between treatment and clinical outcome has not been performed, and machine learning approaches for long-term outcome prediction in pediatric asthma have not been studied in depth.
Objectives: Our objectives were to use our novel machine learning algorithm, predictor pursuit (PP), to discover pediatric asthma phenotypes on the basis of asthma control in response to controller medications, to predict longitudinal asthma control among children with asthma, and to identify features associated with asthma control within each discovered pediatric phenotype.
Methods: We applied PP to the Childhood Asthma Management Program study data (n = 1,019) to discover phenotypes on the basis of asthma control between assigned controller therapy groups (budesonide vs. nedocromil). We confirmed PP’s ability to discover phenotypes using the Asthma Clinical Research Network/Childhood Asthma Research and Education network data. We next predicted children’s asthma control over time and compared PP’s performance with that of traditional prediction methods. Last, we identified clinical features most correlated with asthma control in the discovered phenotypes.
Results: Four phenotypes were discovered in both datasets: allergic not obese (A+/O−), obese not allergic (A−/O+), allergic and obese (A+/O+), and not allergic not obese (A−/O−). Of the children with well-controlled asthma in the Childhood Asthma Management Program dataset, we found more nonobese children treated with budesonide than with nedocromil (P = 0.015) and more obese children treated with nedocromil than with budesonide (P = 0.008). Within the obese group, more A+/O+ children’s asthma was well controlled with nedocromil than with budesonide (P = 0.022) or with placebo (P = 0.011). The PP algorithm performed significantly better (P < 0.001) than traditional machine learning algorithms for both short- and long-term asthma control prediction. Asthma control and bronchodilator response were the features most predictive of short-term asthma control, regardless of type of controller medication or phenotype. Bronchodilator response and serum eosinophils were the most predictive features of asthma control, regardless of type of controller medication or phenotype.
Conclusions: Advanced statistical machine learning approaches can be powerful tools for discovery of phenotypes based on treatment response and can aid in asthma control prediction in complex medical conditions such as asthma.
Asthma is a complex disease with heterogeneous inflammation between and within individuals. Over recent decades, the construct of T-helper cell type 2 (Th2)-predominant inflammation has broadened to include Th1, Th17, and regulatory T-cell inflammatory profiles and more (1, 2). Because inhaled corticosteroids (ICS) have a wide range of antiinflammatory properties, they are recommended as the first-line asthma medication for all patients with persistent asthma (3). However, not all individuals respond in the same way to controller medication within and between medication classes (4, 5), and 10% or more of those with asthma have asthma considered difficult to control (6). To find patterns in the data, approaches such as clustering and predictive modeling have been used.
Using clustering methods (e.g., k-means or latent class analysis), researchers have found specific phenotypes in adults and children that include allergic markers, body mass index (BMI), age of asthma onset, clinical manifestations, and severity (7–19). However, these methods have limitations. They often require clinicians to choose the features (variables) included in the model, which can introduce feature selection bias. In addition, there is lack of statistical confirmation of the differences between clusters. Most importantly for clinical decisions, they do not inform which treatment an individual patient may benefit from the most.
Traditional prediction algorithms (e.g., logistic regression) to predict asthma control have shown promise. It has been found that short-term asthma control is most indicative of future control (20–22). However, long-term (≥1 yr) asthma control prediction is more challenging, in part owing to instability of features that change over time, such as adherence and seasonality (23, 24). Standard models often cannot capture these complex relationships, because they usually apply a “one-size-fits-all” model to the entire feature space (i.e., all dimensional combinations of variables in the dataset). To our knowledge, long-term asthma control prediction has not been assessed in depth in children.
Our novel machine learning tool, predictor pursuit (PP) (25), addresses these limitations of other machine learning and prediction methods. The PP tool was designed to discover phenotypes and predict clinical outcomes in an entirely data-driven fashion with the ability to find heterogeneous relationships among clinical features and outcomes. Therefore, the asthma domain is ideal for our tool to discover complex data patterns that have clinical relevance.
In this study, we used the PP machine learning algorithm 1) to discover statistically distinct asthma phenotypes on the basis of asthma control according to the type of controller therapy, 2) to predict asthma control state (both long and short term) on the basis of clinical features, and 3) to identify the most predictive clinical features of asthma control state for the discovered phenotypes. Some of these results were previously reported in the form of an abstract (26).
The first capability of PP is to identify phenotypes (subgroups) of children on the basis of statistical differences in asthma control status. The method iteratively discovers phenotypes in a dataset until there are no statistical differences that can lead to further division (Figure 1). This permits independent, data-driven discoveries.

Figure 1. Block diagram of predictor pursuit algorithm for phenotype discovery. χ = entire patient group, χx = patient subgroup, δ = minimum number of patients in the patient subgroup. A “greedy algorithm” (left) is an algorithm that is repeatedly applied to each newly generated patient group (right) until no more subspaces exist that satisfy the rule of statistically different clinical outcomes. The output of the function is identifying phenotypes.
[More] [Minimize]The second capability is to predict clinical outcome on the basis of all available features. PP sequentially divides the feature space and assigns different predictive models for each discovered feature subspace to capture relationships between features and the outcomes to maximize prediction accuracy (Figure 2). It continues to divide the feature space until there are no prediction performance improvements that can lead to a further division. The advantage of this method is that heterogeneous (different) predictive models can be used for the discovered feature subspaces in a way that is not possible with traditional machine learning methods such as logistic regression (Figure 3). The PP method does not require variable standardization, because it uses statistical differences (the two-sample t test) (27) as criteria rather than a distance metric. Additional details on these methods are provided in the online supplement.

Figure 2. Block diagram of predictor pursuit algorithm (PP) for outcome prediction. hx = assigned predictive model to the patient group (χx). Compared with the phenotype discovery function of PP (Figure 1), the output of the function is predictive modeling. Furthermore, the outcome prediction function of PP has different criteria for division of the patient groups. The final tree (left) is achieved after PP repeatedly applies the steps (right) to the newly generated patient group until there are no more patient subgroups that yield additional improvement of prediction accuracy.
[More] [Minimize]
Figure 3. Predictor pursuit algorithm (PP) constructs different predictive models for various subspaces, in contrast to standard machine learning methods that apply one predictive (i.e., a “one-size-fits-all”) model for the entire feature space. PP (left) simultaneously divides the patient groups and assigns a corresponding predictive model for each patient group to further improve (maximize) the prediction accuracy, whereas standard logistic regression (right) tries to find the single predictive model for the entire group that maximizes the prediction accuracy.
[More] [Minimize]We applied the PP algorithm to two datasets: the Childhood Asthma Management Program (CAMP) trial and the Asthma Clinical Research Network (ACRN)/Childhood Asthma Research and Education (CARE) network. The CAMP trial (28) is a large, randomized, placebo-controlled pediatric asthma study. The de-identified dataset was obtained by request through the NHLBI’s Biologic Specimen and Data Repository Information Coordinating Center (29). The study included 1,041 participants with mild to moderate persistent asthma who were ages 5–12 years old and assigned to budesonide, nedocromil, or placebo medication. Per the study protocol, the children were assessed at baseline and every 4 months thereafter. There were 962 features (variables) collected at these intervals, including sociodemographics, lung function measurements, asthma morbidity, use of healthcare resources, side effects, change in controller medication, missed school days, physical growth and development, and psychological development.
The ACRN/CARE network (30). This dataset is a collection of information on adults and children with mild to severe asthma from multiple studies and represents a mix of observational and randomized controlled (with and without crossover design) studies. The ACRN/CARE data were obtained by request from the NHLBI SNP Health Association Asthma Resource Project through the database of Genotypes and Phenotypes (31). There were a total of 1,353 adults and children in the dataset. The dataset comprised a variety of ICS (beclomethasone, budesonide, flunisolide, fluticasone, and triamcinolone), so we grouped them into one category for comparison with montelukast. We selected children aged 5–12 years old with used of any inhaled corticosteroid or montelukast (n = 684). We harmonized 56 features across the datasets. Features included sociodemographics, lung function measurements, asthma symptoms, use of healthcare resources, and physical growth.
To ensure a consistent data trend over time, we excluded individuals with fewer than four clinical follow-up visits documented over the entire study period in the CAMP dataset and fewer than four consecutive visits in the ACRN/CARE dataset. We excluded children with other known pulmonary conditions, such cystic fibrosis. The University of California, Los Angeles Institutional Review Board approved this study.
We excluded features with greater than 5% missing data. Missing data were imputed using the k-nearest neighbors imputation methodology (32). In addition to the provided features, we generated features documented in the literature as relevant to asthma phenotypes, such as allergic status, obese status, bronchodilator response, and adherence (8). In both datasets, a BMI greater than or equal to the 95th percentile for age defined obesity. This was calculated by converting the raw BMI scores to percentiles using a BMI conversion table based on age and sex. In the CAMP dataset, we defined an allergic state by the presence of any one of the following features: a positive skin test to any allergen, history of allergy shots, or physician diagnosis of allergy. In the ACRN/CARE dataset, there were not as many features available; therefore, allergic status was determined on the basis of a positive skin test to any allergen. For the CAMP dataset only, bronchodilator response was defined as the change in the FEV1 percentage after bronchodilator administration. Adherence was determined by a “no” response to the question, “Takes medicine as prescribed?”
The outcome measurement was asthma control state (well controlled vs. not well controlled) as defined by the 2007 National Asthma Education and Prevention Program asthma guideline criteria for impairment and risk (3). After feature exclusion, “not well controlled” was defined in the CAMP dataset by the presence of one of more of the following: FEV1 less than 80% predicted, symptoms more than two times per week, use of short-acting β-agonist more than 2 days per week, any limitations in normal activity, or any emergency room visit or hospitalization. In the ACRN/CARE dataset, “not well controlled” was defined as FEV1 less than 80% predicted, symptoms more than times per week, rescue medication use less than one time per week, any emergency room visit or hospitalization, or oral steroid use more than once per year.
We applied PP to each dataset to divide the children into phenotypes that maximized the difference in asthma control between controller therapy groups (CAMP trial, budesonide vs. nedocromil; ACRN/CARE data, ICS vs. montelukast). For both datasets, we separated the data into independent training and testing sets; that is, the training set discovered the phenotypes and then confirmed these in the independent testing sets. We used 50% of the cases at random for the training set, and the other 50% were the testing set. We examined associations between the type of asthma controller therapy and asthma control within each phenotype using proportional tests (33) (with a significance level of 0.05) and verified these results with permutation tests (34). We used interaction tests (35) to test whether the associations between the type of asthma controller therapy and asthma control varied between phenotypes. Although the ACRN/CARE dataset represented mostly participants from randomized trials, it also included observational data. Therefore, we used inverse propensity of treatment weighting to account for treatment selection bias from this dataset before applying PP (36).
For each phenotype identified in the CAMP study data, we used a Markov chain approach model to estimate the likelihood of patients within phenotypes remaining in their current asthma control state at 4-month intervals based on the previous state. The results are presented Figure E4 in the online supplement.
We addressed the final two aims of the study using the CAMP dataset because of its larger sample size and greater number of features. First, we applied PP to predict asthma control on the basis of all available features over both the short term (4 mo) and the long term (after four follow-up visits, or approximately 1 yr). If asthma was well controlled at three or more follow-up visits, we classified the participant as “well controlled”; otherwise, the participant was classified as “not well controlled.” To validate the results, we trained the predictive model on a training set and tested it on an independent testing set using fivefold cross-validation. We compared the results with traditional machine learning methods: neural networks, logistic regression, adaptive boosting, random forests, support vector machine, and naive Bayes using two-sample t tests.
Next, we identified the strongest (based on Pearson correlation coefficient) indicative features for short- and long-term asthma control for the four discovered phenotypes. We determined the important features regardless of assigned medication and then analyzed the features within each phenotype by assigned treatment (budesonide or nedocromil). We further studied the predictive value of each feature over the long term using the Python sklearn package (https://pypi.python.org/pypi/sklearn/0.0), and the results are presented in the online supplement (see Figure E5).
There were 1,019 children from the CAMP study and 669 children from the ACRN/CARE dataset in the final analysis. The baseline features of the groups are described in Tables 1 and 2. There were 602 (of 962) clinical features (variables) used in our model from the CAMP study and 54 (of 57) from the ACRN/CARE dataset. Over the end segment of the studies, we classified 36.7% of the children as well controlled in the CAMP study and 21.5% as well controlled in the ACRN/CARE dataset.
| Feature | Entire Cohort (n = 1,019) | Budesonide (n = 302) | Nedocromil (n = 307) | Placebo (n = 410) |
|---|---|---|---|---|
| Age, yr | 8.9 ± 2.1 | 9.0 ± 2.1 | 8.9 ± 2.1 | 8.9 ± 2.2 |
| Sex, male | 609 (59.8%) | 175 (58.0%) | 201 (65.5%) | 233 (56.8%) |
| Age asthma onset, yr | 3.8 ± 2.3 | 3.7 ± 2.2 | 3.8 ± 2.4 | 3.9 ± 2.4 |
| Adherent to medication, yes | 617 (60.6%) | 178 (58.9%) | 193 (62.9%) | 246 (60.0%) |
| FEV1, % predicted | 93.8 ± 14.3 | 93.7 ± 14.6 | 93.5 ± 14.4 | 94.0 ± 14.0 |
| Bronchodilator response ≥12% | 319 (31.3%) | 100 (33.1%) | 95 (30.9%) | 124 (30.2%) |
| Blood eosinophils, IU/L | 202 ± 175 | 192 ± 146 | 208 ± 203 | 206 ± 171 |
| Body mass index, kg/m2 | 18.46 ± 5.1 | 18.83 ± 6.7 | 18.24 ± 4.8 | 18.35 ± 3.9 |
| Ethnicity | ||||
| Black | 137 (13.4%) | 44 (14.6%) | 37 (12.1%) | 56 (13.7%) |
| Hispanic | 97 (9.5%) | 32 (10.6%) | 28 (9.1%) | 37 (9.0%) |
| Other | 89 (8.7%) | 30 (9.9%) | 26 (8.5%) | 33 (8.0%) |
| White | 696 (68.3%) | 196 (64.9%) | 216 (70.4%) | 284 (69.3%) |
| Severity | ||||
| Mild | 474 (46.6%) | 140 (46.4%) | 139 (45.3%) | 195 (47.5%) |
| Moderate | 545 (53.4%) | 162 (53.6%) | 168 (54.7%) | 215 (52.5%) |
| Phenotypes | ||||
| Allergic not obese | 574 (56.3%) | 173 (57.3%) | 181 (59.0%) | 220 (53.7%) |
| Obese not allergic | 43 (4.2%) | 15 (5.0%) | 13 (4.2%) | 15 (3.7%) |
| Allergic and obese | 128 (12.6%) | 38 (12.6%) | 34 (11.1%) | 56 (13.7%) |
| Not allergic not obese | 274 (26.9%) | 76 (25.2%) | 79 (25.7%) | 119 (29.0%) |
| Feature | Entire Cohort (n = 669) | ICS (n = 284) | Montelukast (n = 175) | Placebo (n = 210) |
|---|---|---|---|---|
| Age, yr | 8.2 ± 3.2 | 7.8 ± 2.9 | 9.2 ± 3.4 | 7.9 ± 3.1 |
| Sex, male | 418 (62.5%) | 180 (63.4%) | 110 (62.9%) | 128 (61.0%) |
| Age asthma onset, yr | 3.9 ± 2.1 | 3.8 ± 1.9 | 4.3 ± 2.5 | 3.7 ± 2.1 |
| FEV1, % predicted | 93.2 ± 8.4 | 92.8 ± 8.0 | 94.1 ± 10.0 | 93.0 ± 7.3 |
| Bronchodilator response ≥12% | 79 (11.8%) | 35 (12.3%) | 31 (17.7%) | 13 (6.2%) |
| Blood eosinophils, IU/L | 415 ± 261 | 436 ± 274 | 363 ± 276 | 432 ± 222 |
| Body mass index, kg/m2 | 18.0 ± 3.0 | 17.7 ± 2.7 | 18.2 ± 3.2 | 18.2 ± 3.2 |
| Ethnicity | ||||
| Black | 83 (12.4%) | 31 (10.9%) | 23 (13.1%) | 29 (13.8%) |
| Hispanic | 137 (20.5%) | 60 (21.1%) | 39 (22.3%) | 38 (18.1%) |
| White | 377 (56.4%) | 162 (57.0%) | 95 (54.3%) | 120 (57.1%) |
| Other | 72 (10.8%) | 31 (10.9%) | 18 (10.3%) | 23 (11.0%) |
| Severity | ||||
| Mild | 125 (18.7%) | 49 (17.3%) | 46 (26.3%) | 30 (14.3%) |
| Moderate | 492 (73.5%) | 218 (76.8%) | 106 (60.6%) | 168 (80.0%) |
| Severe | 52 (7.8%) | 17 (6.0%) | 23 (13.1%) | 12 (5.7%) |
| Phenotypes | ||||
| Allergic not obese | 384 (57.4%) | 165 (58.1%) | 99 (56.6%) | 120 (57.1%) |
| Obese not allergic | 32 (4.8%) | 12 (4.2%) | 6 (3.4%) | 14 (6.7%) |
| Allergic and obese | 62 (9.3%) | 25 (8.8%) | 19 (10.9%) | 18 (8.6%) |
| Not allergic not obese | 191 (28.6%) | 82 (28.9%) | 51 (29.1%) | 58 (27.6%) |
The algorithm revealed that obesity- and allergy-related features were the most statistically significant features that distinguished poor control from good control in both the training and testing sets. A sensitivity analysis was performed around one of PP’s parameters: the minimum number of patients in each subgroup (δ). The algorithm was set from 10 to 200. There was no difference in discovered phenotypes in the range of 10–200, but no phenotypes were discovered above 200. Four groups were identified: allergic not obese (A+/O−), obese not allergic (A−/O+), allergic and obese (A+/O+), and not allergic not obese (A−/O−). Overall, for both datasets, we did not find a significant difference in the distribution of assigned controller medications between phenotypes (Tables 1 and 2).
In both the training and test sets of the CAMP study, two phenotypes had significantly different control states by assigned controller medication (Table 3). The A+/O+ phenotype contained more children whose asthma was well controlled with nedocromil versus budesonide (52.6% vs. 16.7%; P = 0.022) or versus placebo (52.6% vs. 19.4%; P = 0.011). The A+/O− phenotype had more children whose asthma was well controlled who were treated with budesonide than with nedocromil (47.2% vs. 31.0%; P = 0.030). There was no significant difference in asthma control between budesonide and placebo (47.2% vs. 35.2%; P = 0.110). Longitudinal asthma control among A+/O− and A+/O+ phenotypes stratified by controller therapy are shown in Figure 4. For the A−/O+ or A−/O− phenotypes, there was no significant difference in the number of children with well-controlled asthma assigned to the budesonide, nedocromil, or placebo arm. Obese children with well-controlled asthma (with or without allergies) were more frequently treated with nedocromil than with budesonide (52.2% vs. 16.0%; P = 0.008) or placebo (52.2% vs. 20.9%; P = 0.009), and nonobese children were more often treated with budesonide than with nedocromil (48.6% vs. 33.8%; P = 0.015) or placebo (48.6% vs. 34.2%; P = 0.016).
| Phenotype | Outcome (Well-controlled Asthma) | ||||||
|---|---|---|---|---|---|---|---|
| Bud | Ned | P Value 1 (Proportional) | P Value 1 (Permutation) | Placebo | P Value 1 (Proportional) | P Value 1 (permutation) | |
| Allergic not obese (n = 277) | 34 (47.2%) | 31 (31.0%) | 0.0304 | 0.0274 | 37 (35.2%) | 0.1100 | 0.1009 |
| Obese not allergic (n = 18) | 1 (14.3%) | 2 (50.0%) | 0.2008 | 0.1851 | 2 (28.6%) | 0.4773 | 0.3936 |
| Allergic and obese (n = 73) | 3 (16.7%) | 10 (52.6%) | 0.0220 | 0.0196 | 7 (19.4%) | 0.0113 | 0.0237 |
| Not allergic not obese (n = 142) | 21 (51.2%) | 19 (39.6%) | 0.2713 | 0.3011 | 17 (32.1%) | 0.0607 | 0.0536 |
| Allergic (n = 350) | 37 (41.1%) | 41 (34.5%) | 0.3245 | 0.3435 | 44 (31.2%) | 0.1239 | 0.1429 |
| Nonallergic (n = 160) | 22 (45.8%) | 21 (40.3%) | 0.0824 | 0.0912 | 19 (31.7%) | 0.1317 | 0.1436 |
| Obese (n = 91) | 4 (16.0%) | 12 (52.2%) | 0.0079 | 0.0105 | 9 (20.9%) | 0.0094 | 0.0152 |
| Nonobese (n = 419) | 55 (48.6%) | 50 (33.8%) | 0.0151 | 0.0123 | 54 (34.2%) | 0.0164 | 0.0334 |

Figure 4. Percentage of children with well-controlled asthma over time among the allergic not obese (A+/O−), obese not allergic (A−/O+), allergic and obese (A+/O+), and not allergic not obese (A−/O−) children treated with budesonide (bud), nedocromil (ned), or placebo in the Childhood Asthma Management Program study data.
[More] [Minimize]In the ACRN/CARE dataset, asthma control varied between those receiving ICS compared with montelukast among those with the A+/O− phenotype (Table 4). This group had more children with well-controlled asthma treated with ICS (40.0% vs. 19.2%; P = 0.004). There was no significant difference in the number of children with well-controlled asthma between ICS and montelukast for children categorized as A+/O+, A−/O+, or A−/O−. Children with allergy and well-controlled asthma (with or without obesity) were more frequently treated with an inhaled corticosteroid than with montelukast (39.9% vs. 24.1%; P = 0.019), and nonobese children were more often treated with an inhaled corticosteroid than with montelukast (33.6% vs. 18.9%; P = 0.011).
| Phenotype | Outcome (Well-controlled Asthma) | |||
|---|---|---|---|---|
| Inhaled Corticosteroids (n = 284) | Montelukast (n = 175) | P Value (Proportional) | P Value (Permutation) | |
| Allergic not obese (n = 134) | 31 (40.0%) | 11 (19.2%) | 0.0044 | 0.0059 |
| Obese not allergic (n = 12) | 0 (0.0%) | 1 (25.0%) | 0.0680 | 0.0513 |
| Allergic and obese (n = 21) | 2 (14.9%) | 1 (11.9%) | 0.4261 | 0.3912 |
| Not allergic not obese (n = 63) | 4 (12.1%) | 2 (7.4%) | 0.2725 | 0.2123 |
| Allergic (n = 155) | 36 (39.9%) | 16 (24.1%) | 0.0192 | 0.0342 |
| Nonallergic (n = 75) | 5 (10.5%) | 3 (9.1%) | 0.4236 | 0.4756 |
| Obese (n = 33) | 3 (14.3%) | 1 (7.9%) | 0.2980 | 0.3294 |
| Nonobese (n = 197) | 38 (33.6%) | 16 (18.9%) | 0.0105 | 0.0174 |
For the entire CAMP study, PP’s short-term prediction accuracy (area under the curve) for level of control over time using all features in the dataset that met the inclusion criteria was 0.86. The long-term prediction accuracy was 0.66. In both cases, PP performed statistically better (with a significance level of 0.05) than typical machine learning methods on this dataset (Table 5). Our results for short-term asthma control prediction were consistent with previous literature indicating that the current control state is the most indicative of control state at the next assessment based on the Pearson correlation coefficient (r = 0.58). The two most correlated predictive features for long-term control were bronchodilator response (r = −0.28) and serum percentage eosinophils (r = −0.16). The 10 strongest predictive features are reported in the online supplement (Table E1).
| Algorithm | Prediction Accuracy (Area Under the Curve) | |||
|---|---|---|---|---|
| Short Term | P Value | Long Term | P Value | |
| Predictor pursuit | 0.8533 ± 0.0117 | — | 0.6517 ± 0.0129 | — |
| Neural network | 0.8297 ± 0.0101 | 0.0317 | 0.6282 ± 0.0091 | 0.0146 |
| Logistic regression | 0.8199 ± 0.0123 | 0.0123 | 0.6168 ± 0.0105 | 0.0047 |
| Adaptive boosting | 0.8080 ± 0.0174 | 0.0077 | 0.5884 ± 0.0251 | 0.0003 |
| Random forests | 0.8211 ± 0.0184 | 0.0349 | 0.5626 ± 0.0226 | <0.0001 |
| Naive Bayes | 0.7963 ± 0.0139 | <0.0001 | 0.5496 ± 0.0128 | <0.0001 |
| Support vector machine | 0.8045 ± 0.0165 | 0.0040 | 0.5936 ± 0.0137 | 0.0005 |
When the cohort was divided into the four phenotypes (A+/O−, A−/O+, A+/O+, and A−/O−), short-term asthma control was still best indicated by previous asthma control state (Table E2). Over the long term, bronchodilator response and serum eosinophils predicted better asthma control (Table 6, Table E3). When these groups were examined on the basis of assigned medication (budesonide or nedocromil), the strongest predictive features remained the current control state for short-term prediction and bronchodilator response for long-term prediction (Tables 7 and E4).
| Rank | A+/O+ | A+/O− | A−/O+ | A−/O− | ||||
|---|---|---|---|---|---|---|---|---|
| Feature | r | Feature | r | Feature | r | Feature | r | |
| 1 | Bronchodilator response | −0.30 | Bronchodilator response | −0.30 | Bronchodilator response | −0.30 | Bronchodilator response | −0.32 |
| 2 | Total Eos | −0.17 | Croup ever | 0.16 | Croup ever | 0.18 | Total Eos | −0.18 |
| 3 | Eos % | −0.17 | Eos % | −0.16 | Total Eos | −0.17 | Eos % | −0.17 |
| Rank | A+/O+ | A+/O− | A−/O+ | A−/O− | ||||
|---|---|---|---|---|---|---|---|---|
| Feature | r | Feature | r | Feature | r | Feature | r | |
| Bud | ||||||||
| 1 | Bronchodilator response | −0.29 | Bronchodilator response | −0.29 | Bronchodilator response | −0.28 | Bronchodilator response | −0.31 |
| 2 | Total Eos count by %WBC | −0.22 | Wheezes apart from cold | −0.18 | Total Eos count by %WBC | −0.21 | Total Eos count by %WBC | −0.22 |
| 3 | Female | 0.19 | Daily asthma medications | −0.17 | Ever in hospital for asthma | −0.21 | Total Eos count | −0.19 |
| Ned | ||||||||
| 1 | Bronchodilator response | −0.32 | Bronchodilator response | −0.31 | Bronchodilator response | −0.34 | Bronchodilator response | −0.34 |
| 2 | Croup | 0.18 | Daily asthma medication use | −0.16 | Croup | 0.22 | Eos % | −0.19 |
| 3 | Use commercial cockroach spray | 0.17 | Croup | 0.15 | Use commercial cockroach spray | 0.18 | Eos % | −0.18 |
PP showed that obesity and allergy features determined phenotypes with the most significant differences in asthma control based on assigned controller medication. More nonobese (specifically allergic nonobese) children with well-controlled asthma were treated with budesonide than with nedocromil, whereas more obese (specifically allergic and obese) children with well-controlled asthma were treated with nedocromil than with budesonide. Asthma control over the short and long term was predicted with better accuracy by PP than by standard machine learning algorithms such as logistic regression. The most relevant features for short-term control prediction were the current control state and bronchodilator response. The strongest predictive features for long-term control prediction were bronchodilator response and serum eosinophils.
Of the four discovered phenotypes, we found that more A+/O+ children with well-controlled asthma were treated with nedocromil than with budesonide or placebo. More A+/O− children with well-controlled asthma were assigned to budesonide than to nedocromil, but there was no statistically significant advantage over placebo. The PP clustering algorithm was able to find these patterns because of its advantages over traditional machine learning algorithms, such as 1) there is no clinician bias of input variables (i.e., it is an entirely data-driven methodology); 2) it identifies phenotypes on the basis of relationships between treatment and controllability; and 3) it guarantees that the discovered phenotypes have statistically significant differences between the treatment responses.
The results of our study are aligned with the finding by Forno and colleagues that overweight/obese individuals with asthma in the CAMP study responded less favorably to budesonide, which our study shows with statistical significance, but Forno and colleagues did not assess the response to nedocromil (37). It is notable that using our method, we discovered this obese asthma group using a machine learning approach rather than manually choosing the cohort to study. In another secondary analysis of CAMP data, Howrylak and colleagues used spectral clustering to find clusters within this dataset and evaluate the cumulative probability of oral steroid course and time to switch controller therapy (38). With their approach, they used a smaller number of clinical features to find clusters, whereas with our method, we used hundreds of variables to determine clusters on the basis of response to medication in a data-driven fashion. In addition, by using the BMI percentile to define obese children, which they did not do in their study, we discovered that obesity is an important feature correlated with treatment response.
Our findings support previous knowledge that Th2-based allergic asthma responds most favorably to ICS (39). Although there can be some allergic overlap, obese asthma tends to be less responsive to inhaled ICS (40–44). This is thought to be due in part to other active inflammatory pathways involving Th17, mast cells, and neutrophils (45–50). Reports in the literature indicate that children with obese and allergic asthma have more severe and/or poorly controlled symptoms (42, 51), so it is valuable to determine which, if any, medications may be more effective in treating this group of children.
Because of the additional or different underlying inflammation, non–ICS-based controller medications could potentially be targeted at certain children with obese asthma. Previous reports in the literature about these controller medications have suggested that the leukotriene receptor antagonist montelukast may have an effect on neutrophilic inflammation, and authors of a recent abstract reported that adults with mild A+/O+ asthma had a more favorable response to montelukast than to ICS (52). The 5-lipoxygenase inhibitors have been suggested to affect the leukotriene B4 pathway produced by neutrophils (53–56). Mast cells have been implicated in obese asthma, and nedocromil and cromolyn may also have an effect on neutrophil function in addition to mast cells (57, 58). Finally, theophylline has been shown to affect apoptosis and chemotaxis of neutrophils (59, 60).
In regard to asthma control state prediction, PP outperformed traditional predictive models such as logistic regression and support vector machine in the short and long term. It was able to reveal more detailed predictive features for long-term asthma control other than the current control state. The PP algorithm (also called ConfidentMatch) has also been applied successfully to the heart transplant domain in terms of predicting transplant outcomes (25). A next step is integrating these predictive models into real-time data to identify children who are most at risk for an asthma exacerbation.
Although our PP findings appear to be clinically aligned, our study does have some shortcomings. First, PP has limitations typical of any machine learning approach in that better predictive power is assumed with larger datasets. Some features were eliminated because of missing data, and the small amount of data imputation may have affected our results. The prediction accuracy for asthma control was higher for short-term intervals (4 mo) than for longer intervals. This may be due to other factors that affect long-term controllability, such as adherence, seasonality, and other features that change over time. In terms of adherence, we could not eliminate children who reported nonadherence, because the sample size would have been too small to be analyzed using our method. However, we noted in the CAMP study data that the level of adherence was similar between all treatment groups and that the PP algorithm did not identify nonadherence as a feature that determined different medication response between the two groups.
Although our goal was to predict asthma phenotype control on the basis of controller medication, we are of course limited in our clinical assumptions with this post hoc analysis of a preexisting dataset. By the end of the study, asthma in the majority of children was not well controlled, and often in the group with well-controlled asthma, the treatment medications did not perform significantly better than placebo.
In conclusion, PP revealed differences in asthma control state related to controller medication on the basis of allergy- and obese-related features. The PP algorithm was also able to predict pediatric asthma control state over the long term with greater accuracy than standard machine learning approaches. However, to make clinical assumptions to guide controller medication choice for a given phenotype, prospective studies with larger datasets of real-world data are needed.
The long-term goal for this line of research is to eventually determine which asthma medication maximizes the probability of a well-controlled state and incorporate this information into the overall asthma treatment plan for children with asthma. For precision medicine in asthma, treatment choice based on asthma phenotype is one part of a comprehensive approach to asthma management that also takes into consideration sociodemographics, environment, adherence, genetics, and other factors.
The authors thank Dr. Peter Szilagyi for his critical analysis of the manuscript revision, Kyeong Ho (Kenneth) Moon for data processing, and Dr. Douglas Bell for introducing this collaboration. The authors acknowledge the National Institutes of Health GWAS Data Repository, the NHLBI, and the investigator(s) who contributed to the phenotype data from his/her original studies. This article was prepared using CAMP research materials obtained from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center and does not necessarily reflect the opinions or views of the CAMP or the NHLBI.
| 1 . | Lloyd CM, Hessel EM. Functions of T cells in asthma: more than just TH2 cells. Nat Rev Immunol 2010;10:838–848. |
| 2 . | Gelfand EW, Alam R. The other side of asthma: steroid-refractory disease in the absence of TH2-mediated inflammation. J Allergy Clin Immunol 2015;135:1196–1198. |
| 3 . | National Asthma Education and Prevention Program. Expert Panel Report 3 (EPR-3): Guidelines for the Diagnosis and Management of Asthma—Summary Report 2007. J Allergy Clin Immunol 2007;120(5 Suppl):S94–S138. |
| 4 . | Szefler SJ, Phillips BR, Martinez FD, Chinchilli VM, Lemanske RF, Strunk RC, et al. Characterization of within-subject responses to fluticasone and montelukast in childhood asthma. J Allergy Clin Immunol 2005;115:233–242. |
| 5 . | Fitzpatrick AM, Jackson DJ, Mauger DT, Boehmer SJ, Phipatanakul W, Sheehan WJ, et al.; NIH/NHLBI AsthmaNet. Individualized therapy for persistent asthma in young children. J Allergy Clin Immunol 2016;138:1608–1618.e12. |
| 6 . | Chung KF, Wenzel SE, Brozek JL, Bush A, Castro M, Sterk PJ, et al. International ERS/ATS guidelines on definition, evaluation and treatment of severe asthma. Eur Respir J 2014;43:343–373. [Published erratum appears in Eur Respir J 2014;43:1216.] |
| 7 . | Green RH, Brightling CE, Bradding P. The reclassification of asthma based on subphenotypes. Curr Opin Allergy Clin Immunol 2007;7:43–50. |
| 8 . | Lötvall J, Akdis CA, Bacharier LB, Bjermer L, Casale TB, Custovic A, et al. Asthma endotypes: a new approach to classification of disease entities within the asthma syndrome. J Allergy Clin Immunol 2011;127:355–360. |
| 9 . | Wenzel S. Severe asthma: from characteristics to phenotypes to endotypes. Clin Exp Allergy 2012;42:650–658. |
| 10 . | Haldar P, Pavord ID, Shaw DE, Berry MA, Thomas M, Brightling CE, et al. Cluster analysis and clinical asthma phenotypes. Am J Respir Crit Care Med 2008;178:218–224. |
| 11 . | Weatherall M, Travers J, Shirtcliffe PM, Marsh SE, Williams MV, Nowitz MR, et al. Distinct clinical phenotypes of airways disease defined by cluster analysis. Eur Respir J 2009;34:812–818. |
| 12 . | Moore WC, Meyers DA, Wenzel SE, Teague WG, Li H, Li X, et al.; National Heart, Lung, and Blood Institute’s Severe Asthma Research Program. Identification of asthma phenotypes using cluster analysis in the Severe Asthma Research Program. Am J Respir Crit Care Med 2010;181:315–323. |
| 13 . | Siroux V, Basagaña X, Boudier A, Pin I, Garcia-Aymerich J, Vesin A, et al. Identifying adult asthma phenotypes using a clustering approach. Eur Respir J 2011;38:310–317. |
| 14 . | Schatz M, Hsu JW, Zeiger RS, Chen W, Dorenbaum A, Chipps BE, et al. Phenotypes determined by cluster analysis in severe or difficult-to-treat asthma. J Allergy Clin Immunol 2014;133:1549–1556. |
| 15 . | Deliu M, Sperrin M, Belgrave D, Custovic A. Identification of asthma subtypes using clustering methodologies. Pulm Ther 2016;2:19–41. |
| 16 . | Loza MJ, Adcock I, Auffray C, Chung KF, Djukanovic R, Sterk PJ, et al.; ADEPT and U-BIOPRED Investigators. Longitudinally stable, clinically defined clusters of patients with asthma independently identified in the ADEPT and U-BIOPRED asthma studies. Ann Am Thorac Soc 2016;13(Suppl 1):S102–S103. |
| 17 . | Loureiro CC, Sa-Couto P, Todo-Bom A, Bousquet J. Cluster analysis in phenotyping a Portuguese population. Rev Port Pneumol (2006) 2015;21:299–306. |
| 18 . | Spycher BD, Silverman M, Brooke AM, Minder CE, Kuehni CE. Distinguishing phenotypes of childhood wheeze and cough using latent class analysis. Eur Respir J 2008;31:974–981. |
| 19 . | Fitzpatrick AM, Teague WG, Meyers DA, Peters SP, Li X, Li H, et al.; National Institutes of Health/National Heart, Lung, and Blood Institute Severe Asthma Research Program. Heterogeneity of severe asthma in childhood: confirmation by cluster analysis of children in the National Institutes of Health/National Heart, Lung, and Blood Institute Severe Asthma Research Program. J Allergy Clin Immunol 2011;127:382–389.e13. |
| 20 . | Luo G, Stone BL, Fassl B, Maloney CG, Gesteland PH, Yerram SR, et al. Predicting asthma control deterioration in children. BMC Med Inform Decis Mak 2015;15:84. |
| 21 . | Sharma HP, Matsui EC, Eggleston PA, Hansel NN, Curtin-Brosnan J, Diette GB. Does current asthma control predict future health care use among black preschool-aged inner-city children? Pediatrics 2007;120:e1174–e1181. |
| 22 . | Finkelstein J, Jeong IC. Machine learning approaches to personalize early prediction of asthma exacerbations. Ann N Y Acad Sci 2017;1387:153–165. |
| 23 . | Johnson KM, FitzGerald JM, Tavakoli H, Chen W, Sadatsafavi M. Stability of asthma symptom control in a longitudinal study of mild-moderate asthmatics. J Allergy Clin Immunol Pract 2017;5:1663–1670.e5. |
| 24 . | Schatz M, Zeiger RS, Yang SJ, Chen W, Crawford W, Sajjan S, et al. Change in asthma control over time: predictors and outcomes. J Allergy Clin Immunol Pract 2014;2:59–64. |
| 25 . | Yoon J, Alaa M, Cadeiras M, van der Schaar M. Personalized donor-recipient matching for organ transplantation. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17). February 4–9, 2017. San Francisco, CA. pp. 1647–1654. |
| 26 . | Ross MK, Yoon J, Ho Moon K, Van Der Schaar M. A personalized approach to asthma control over time: discovering phenotypes using machine learning [abstract]. Am J Respir Crit Care Med 2017;195:A5093. |
| 27 . | Zimmerman DW. A note on interpretation of the paired-samples t test. J Educ Behav Stat 1997;22:349–360. |
| 28 . | Szefler S, Weiss S, Tonascia J, Adkinson NF, Bender B, Cherniack R, et al.; Childhood Asthma Management Program Research Group. Long-term effects of budesonide or nedocromil in children with asthma. N Engl J Med 2000;343:1054–1063. |
| 29 . | Giffen CA, Carroll LE, Adams JT, Brennan SP, Coady SA, Wagner EL. Providing contemporary access to historical biospecimen collections: development of the NHLBI Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC). Biopreserv Biobank 2015;13:271–279. |
| 30 . | Denlinger LC, Sorkness CA, Chinchilli VM, Lemanske RF Jr. Guideline-defining asthma clinical trials of the National Heart, Lung, and Blood Institute’s Asthma Clinical Research Network and Childhood Asthma Research and Education Network. J Allergy Clin Immunol 2007;119:3–11, quiz 12–13. |
| 31 . | Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 2007;39:1181–1186. |
| 32 . | Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics 2001;17:520–525. |
| 33 . | McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 1947;12:153–157. |
| 34 . | Good P. Permutation tests: a practical guide to resampling methods for testing hypotheses. 3rd ed. New York: Springer Science & Business Media; 2010. |
| 35 . | Altman DG, Bland JM. Interaction revisited: the difference between two estimates. BMJ 2003;326:219. |
| 36 . | Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res 2011;46:399–424. |
| 37 . | Forno E, Lescher R, Strunk R, Weiss S, Fuhlbrigge A, Celedón JC; Childhood Asthma Management Program Research Group. Decreased response to inhaled steroids in overweight and obese asthmatic children. J Allergy Clin Immunol 2011;127:741–749. |
| 38 . | Howrylak JA, Fuhlbrigge AL, Strunk RC, Zeiger RS, Weiss ST, Raby BA; Childhood Asthma Management Program Research Group. Classification of childhood asthma phenotypes and long-term clinical responses to inhaled anti-inflammatory medications. J Allergy Clin Immunol 2014;133:1289–1300.e12. |
| 39 . | Walker C, Bode E, Boer L, Hansel TT, Blaser K, Virchow JC Jr. Allergic and nonallergic asthmatics have distinct patterns of T-cell activation and cytokine production in peripheral blood and bronchoalveolar lavage. Am Rev Respir Dis 1992;146:109–115. |
| 40 . | Peters-Golden M, Swern A, Bird SS, Hustad CM, Grant E, Edelman JM. Influence of body mass index on the response to asthma controller agents. Eur Respir J 2006;27:495–503. |
| 41 . | Boulet LP, Franssen E. Influence of obesity on response to fluticasone with or without salmeterol in moderate asthma. Respir Med 2007;101:2240–2247. |
| 42 . | Sutherland ER, Goleva E, King TS, Lehman E, Stevens AD, Jackson LP, et al.; Asthma Clinical Research Network. Cluster analysis of obesity and asthma phenotypes. PLoS One 2012;7:e36631. |
| 43 . | Meagher LC, Cousin JM, Seckl JR, Haslett C. Opposing effects of glucocorticoids on the rate of apoptosis in neutrophilic and eosinophilic granulocytes. J Immunol 1996;156:4422–4428. |
| 44 . | Sampson AP. The role of eosinophils and neutrophils in inflammation. Clin Exp Allergy 2000;30:22–27. |
| 45 . | Dixon AE, Holguin F, Sood A, Salome CM, Pratley RE, Beuther DA, et al.; American Thoracic Society Ad Hoc Subcommittee on Obesity and Lung Disease. An official American Thoracic Society Workshop report: obesity and asthma. Proc Am Thorac Soc 2010;7:325–335. |
| 46 . | Telenga ED, Tideman SW, Kerstjens HA, Hacken NH, Timens W, Postma DS, et al. Obesity in asthma: more neutrophilic inflammation as a possible explanation for a reduced treatment response. Allergy 2012;67:1060–1068. |
| 47 . | McGrath KW, Icitovic N, Boushey HA, Lazarus SC, Sutherland ER, Chinchilli VM, et al.; Asthma Clinical Research Network of the National Heart, Lung, and Blood Institute. A large subgroup of mild-to-moderate asthma is persistently noneosinophilic. Am J Respir Crit Care Med 2012;185:612–619. |
| 48 . | Moore WC, Hastie AT, Li X, Li H, Busse WW, Jarjour NN, et al.; National Heart, Lung, and Blood Institute’s Severe Asthma Research Program. Sputum neutrophil counts are associated with more severe asthma phenotypes using cluster analysis. J Allergy Clin Immunol 2014;133:1557–1563.e5. |
| 49 . | Sismanopoulos N, Delivanis DA, Mavrommati D, Hatziagelaki E, Conti P, Theoharides TC. Do mast cells link obesity and asthma? Allergy 2013;68:8–15. |
| 50 . | Liu J, Divoux A, Sun J, Zhang J, Clément K, Glickman JN, et al. Genetic deficiency and pharmacological stabilization of mast cells reduce diet-induced obesity and diabetes in mice. Nat Med 2009;15:940–945. |
| 51 . | Dixon AE, Poynter ME. Mechanisms of asthma in obesity: pleiotropic aspects of obesity produce distinct asthma phenotypes. Am J Respir Cell Mol Biol 2016;54:601–608. |
| 52 . | Farzan S, Khan S, Elera C, Akerman M. Montelukast is a better controller in obese atopic asthmatics [abstract 683]. J Allergy Clin Immunol 2016;137(2 Suppl):AB210. |
| 53 . | Anderson R, Theron AJ, Gravett CM, Steel HC, Tintinger GR, Feldman C. Montelukast inhibits neutrophil pro-inflammatory activity by a cyclic AMP-dependent mechanism. Br J Pharmacol 2009;156:105–115. |
| 54 . | Theron AJ, Gravett CM, Steel HC, Tintinger GR, Feldman C, Anderson R. Leukotrienes C4 and D4 sensitize human neutrophils for hyperreactivity to chemoattractants. Inflamm Res 2009;58:263–268. |
| 55 . | Al Saadi MM, Meo SA, Mustafa A, Shafi A, Tuwajri AS. Effects of montelukast on free radical production in whole blood and isolated human polymorphonuclear neutrophils (PMNs) in asthmatic children. Saudi Pharm J 2011;19:215–220. |
| 56 . | Busse WW. Leukotrienes and inflammation. Am J Respir Crit Care Med 1998;157:S210–S213. |
| 57 . | Rand TH, Lopez AF, Gamble JR, Vadas MA. Nedocromil sodium and cromolyn (sodium cromoglycate) selectively inhibit antibody-dependent granulocyte-mediated cytotoxicity. Int Arch Allergy Appl Immunol 1988;87:151–158. |
| 58 . | Yazid S, Leoni G, Getting SJ, Cooper D, Solito E, Perretti M, et al. Antiallergic cromones inhibit neutrophil recruitment onto vascular endothelium via annexin-A1 mobilization. Arterioscler Thromb Vasc Biol 2010;30:1718–1724. |
| 59 . | Condino-Neto A, Vilela MM, Cambiucci EC, Ribeiro JD, Guglielmi AA, Magna LA, et al. Theophylline therapy inhibits neutrophil and mononuclear cell chemotaxis from chronic asthmatic children. Br J Clin Pharmacol 1991;32:557–561. |
| 60 . | Yasui K, Agematsu K, Shinozaki K, Hokibara S, Nagumo H, Nakazawa T, et al. Theophylline induces neutrophil apoptosis through adenosine A2A receptor antagonism. J Leukoc Biol 2000;67:529–535. |
Supported by National Institutes of Health (NIH) grant U54TR001627 (M.K.R.) and National Science Foundation grant ECCS1407712 (J.Y., M.v.d.S.).
Author Contributions: M.K.R.: conceptualized the project; was involved in the methodology, validation, data curation, and writing of the original draft, review, and editing of the manuscript; and was involved in supervision and project administration. J.Y.: was involved in the algorithm methodology; software application; validation; formal analysis; data curation; data visualization; and writing, review, and editing of the manuscript. A.v.d.S.: was involved in the algorithm methodology; software application; validation; formal analysis; data curation; data visualization; and review, and editing of the manuscript. M.v.d.S. was involved in the supervision of the methodology; software application; validation; formal analysis; resources; data curation; data visualization; and writing, review, and editing of the manuscript.
This article has an online supplement, which is accessible from this issue’s table of contents at www.atsjournals.org
Author disclosures are available with the text of this article at www.atsjournals.org.
