Classification and regression trees for epidemiologic research: an air pollution example (18 page pdf, Katherine Gass, Mitch Klein,Howard H Chang, W Dana Flanders, Matthew J Strickland, Environmental Health, Mat. 13, 2014)
Today we review a paper that looks at ways that statistical regression trees may be used to improve the estimates of how much “confounding” [mix up (something) with something else so that the individual elements become difficult to distinguish.] goes on when there are multiple air pollutants that may or may not combine and augment each other in producing the health impacts that they collectively cause. The authors used over10 years of daily data for CO, NO2, O3, and PM2.5. Interestingly, they suggest that this same approach may
be useful in nutrition.
“The end product of a typical C&RT analysis is a dendogram illustrating the paths of dichotomous splits. Every tree starts with a “root node” that contains the observations from which the tree will be grown. The observations are then partitioned into two “child nodes” based on the value of an independent predictor variable… Each child node may be further partitioned, again based on the value of an independent predictor variable. This process continues until a set of partitioning criteria are no longer met, resulting in terminal nodes.. The collection of terminal nodes forms a complete partition of the observations in the root node.”
“The C&RT algorithm we have proposed enables effect estimation through the withholding of a common referent group of days during tree construction. This allows for estimation of joint effects across terminal nodes in relation to the pre-specified reference group. Selecting the referent group a priori ensures that it does not depend on the analysis”
“there may be certain meteorological factors that promote this specific pollutant covariation and influence personal exposure levels, such as relative humidity. These hypotheses lead to several researchable questions…. Does residual confounding or effect measure modification by meteorological factors further explain the relative risks associated with each terminal node?”
“In air pollution epidemiology, while there is currently interest in moving from a single pollutant to a multipollutant framework, the term “multipollutant” is often used broadly and may encompass many different conceptual issues….When the multipollutant interest involves the joint effects of several pollutants, we feel that C&RT, particularly with the modifications mentioned in this paper, is a very appropriate tool.”