I'm attempting to fit boosted regression trees to a censored response using IPCW weighting. I've implemented this through two libraries, mboost and gbm, which I believe should yield models that would perform comparably. This, however, is not the case - mboost performs much better. This seems odd. This issue is meaningful since the output of this regression needs to be implemented in a production system, and mboost doesn't even expose the ensembles.
# default params for blackboost are a gaussian loss, and maxdepth of 2 m.mboost = blackboost(Y ~ X1 + X2, data=tdata, weights=t.ipcw, control=boost_control(mstop=100)) m.gbm = gbm(Y ~ X1 + X2, data=tdata, weights=t.ipcw, distribution="gaussian", interaction.depth=2, bag.fraction=1, n.trees=2500) # compare IPCW weighted squared loss sum((predict(m.mboost, newdata=tdata)-tdata$Y)^2 * t.ipcw) < sum((predict(m.gbm, newdata=tdata, n.trees=2500)-tdata$Y)^2 * t.ipcw) # TRUE, mboost with 100 trees will reduce the loss function from gbm 100 trees by 20%, and gbm with 2500 trees by 5% The documentation says blackboost essentially does the same thing as mboost, so any ideas on what could be driving this large difference in performance? -- View this message in context: http://r.789695.n4.nabble.com/mboost-vs-gbm-tp4637518.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.