Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-06 Thread Patrick Connolly
On Tue, 02-Mar-2010 at 02:43PM -0500, Liaw, Andy wrote: |> In most implementations of boosting, and for that matter, single tree, |> the first variable wins when there are ties. In randomForest the That still doesn't explain why with gbm, two identical variables will "share the glory" (approxima

Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-02 Thread Max Kuhn
On Tue, Mar 2, 2010 at 2:43 PM, Liaw, Andy wrote: > In most implementations of boosting, and for that matter, single tree, > the first variable wins when there are ties. They must be in a union :-) >> What happens if there's a third? If they were P perfectly correlated predictors, the importanc

Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-02 Thread Liaw, Andy
In most implementations of boosting, and for that matter, single tree, the first variable wins when there are ties. In randomForest the variables are sampled, and thus not tested in the same order from one node to the next, thus the variables are more likely to "share the glory". Best, Andy Fro

Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-02 Thread Patrick Connolly
On Mon, 01-Mar-2010 at 12:01PM -0500, Max Kuhn wrote: |> In theory, the choice between two perfectly correlated predictors is |> random. Therefore, the importance should be "diluted" by half. |> However, this is implementation dependent. |> |> For example, run this: |> |> set.seed(1) |> n <-

Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-01 Thread Max Kuhn
In theory, the choice between two perfectly correlated predictors is random. Therefore, the importance should be "diluted" by half. However, this is implementation dependent. For example, run this: set.seed(1) n <- 100 p <- 10 data <- as.data.frame(matrix(rnorm(n*(p-1)), nrow = n)) dat

[R] Gradient Boosting Trees with correlated predictors in gbm

2010-02-28 Thread Lars Bishop
Dear R users, I’m trying to understand how correlated predictors impact the Relative Importance measure in Stochastic Boosting Trees (J. Friedman). As Friedman described “ …with single decision trees (referring to Brieman’s CART algorithm), the relative importance measure is augmented by a strate