On Dec 26, 2010, at 17:54 , Tiina Hakanen wrote: > Hi! > > I have some questions about MARS model's coefficient of determination. I use > the MARS method in my master's thesis and I have noticed some problems with > the MARS model's R^2. > > You can see the following example that the MARS model's R^2 is too big when i > have used mars() -function for MARS model building, and when I have made > MARS-model using a linear regression, it gives much smaller R^2. > > So can you please tell me some information about why the MARS model R^2 is so > big? How can I get the MARS model“s correct R^2 in R-projector some another > way than in the following example or by calculating it myself using > R^2-formula?
This isn't really to do with MARS as such. You have two equivalent linear models, one with and one without an intercept (i.e., the first column m$x1 is the constant 1). R computes the R^2 so that it is consistent with the overall F test, which you can see has three numerator DF in the marsmodel, but only two in the corresponding linear model. Put differently, the null model is zero in one case and a constant in the other. This sometimes catches people out, but without such a convention, no-intercept models could get negative R^2. Pragmatically, if you are sure that the marsmodel will always contain the intercept-only model, does lm(data[,1]~m$x) not provide the desired R^2, with a warning that one parameter is aliased? > > I hope you can reply soon. > > Best regards, > > Tiina Hakanen > > > library(ElemStatLearn) > library(mda) > data<-ozone > m<-mars(data[,-1], data[,1], nk=4) > m$factor[m$s,] > m$cuts[m$s,] > m$coef > marsmodel<-lm(data[,1]~m$x-1) > summary(marsmodel) > > Call: > lm(formula = data[, 1] ~ m$x - 1) > > Residuals: > Min 1Q Median 3Q Max > -36.264 -15.993 -2.351 9.993 122.793 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > m$x1 52.9783 3.8894 13.621 < 2e-16 *** > m$x2 4.7383 0.9599 4.936 2.92e-06 *** > m$x3 -1.9428 0.3084 -6.300 6.61e-09 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 23.38 on 108 degrees of freedom > Multiple R-squared: 0.8147, Adjusted R-squared: 0.8095 > F-statistic: 158.2 on 3 and 108 DF, p-value: < 2.2e-16 > > knot1 <- function (x,k) ifelse(x > k, x-k, 0) > knot2 <- function(x, k) ifelse(x < k, k-x, 0) > reg <- lm(ozone ~knot1(temperature,85)+knot2(temperature,85),data=data) > > summary(reg) > > Call: > lm(formula = ozone ~ knot1(temperature, 85) + knot2(temperature, > 85), data = data) > > Residuals: > Min 1Q Median 3Q Max > -36.264 -15.993 -2.351 9.993 122.793 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 52.9783 3.8894 13.621 < 2e-16 *** > knot1(temperature, 85) 4.7383 0.9599 4.936 2.92e-06 *** > knot2(temperature, 85) -1.9428 0.3084 -6.300 6.61e-09 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 23.38 on 108 degrees of freedom > Multiple R-squared: 0.5153, Adjusted R-squared: 0.5064 > F-statistic: 57.42 on 2 and 108 DF, p-value: < 2.2e-16 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.