On 11/02/2015 4:30 PM, Goldschneider, Jill wrote: > I was playing with some examples of piecewise regression using lm() and have > come across a behavior I'm uncertain about. > Below is simple 3-segment dataset. I compare predicted output of a model > created by one call to lm() to that of 3 models created by 3 calls to lm(). > In case A and B, the results are the same. However, in case C the results > differ for the middle segment. > Is the output of lm() for case C to be expected or not and if so, why?
Take a look at the fit value, and you'll see what happened: > lm(y ~ (x<=10)*x + I(x>10 & x<=20) + (x>20)*x) Call: lm(formula = y ~ (x <= 10) * x + I(x > 10 & x <= 20) + (x > 20) * x) Coefficients: (Intercept) x <= 10TRUE x I(x > 10 & x <= 20)TRUE 35.0741 -15.0733 -0.1644 -17.7344 x > 20TRUE x <= 10TRUE:x x:x > 20TRUE NA 1.2397 -0.9658 The * in a formula means "main effect plus interaction", not multiplication. W hat you want is lm(y ~ I((x<=10)*x) + I(x>10 & x<=20) + I((x>20)*x)) This doesn't give exactly the same results as the segmented regression, because it uses the same intercept in all three segments; you might want to add I(x <= 10) as well if you don't want that. Duncan Murdoch > Thank you, > Jill > > set.seed(133) > y <- c(21:30, rep(15,10), 10:1) + runif(30, -2, 2) > x <- 1:30 > > # Case A: 3 segments, each fit with a constant value > plot(x, y) > # 3 different lm fits > p.c3 <- c(predict(lm(y[1:10]~1)), predict(lm(y[11:20]~1)), > predict(lm(y[21:30]~1))) > lines(x, p.c3, col="red") > # piecewise via lm > p.lm1 <- predict(lm(y ~ I(x<=10) + I(x>10 & x<=20) + I(x>20))) > lines(x, p.lm1, col="blue") > max(abs(p.c3-p.lm1)) > > # Case B: 3 segments, each fit with a line > plot(x, y) > # 3 different lm fits > p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~x[11:20])), > predict(lm(y[21:30]~x[21:30]))) > lines(x, p.c3, col="red") > # piecewise via lm > p.lm1 <- predict(lm(y ~ (x<=10)*x + (x>10 & x<=20)*x + (x>20)*x)) > lines(x, p.lm1, col="blue") > max(abs(p.c3-p.lm1)) > > # Cast C: 3 segments - the middle fit with a constant value, the outer by a > line > plot(x, y) > # 3 different lm fits > p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~1)), > predict(lm(y[21:30]~x[21:30]))) > lines(x, p.c3, col="red") > # piecewise via lm > p.lm1 <- predict(lm(y ~ (x<=10)*x + I(x>10 & x<=20) + (x>20)*x)) > lines(x, p.lm1, col="blue") > max(abs(p.c3-p.lm1)) > ## the single call to lm does not have a constant value fit to the middle > section > plot(x, p.c3-p.lm1 ) > > > > The information contained in this transmission may contain confidential > information. If the reader of this message is not the intended recipient, > you are hereby notified that any review, dissemination, distribution or > duplication of this communication is strictly prohibited. If you are not the > intended recipient, please contact the sender by reply email and destroy all > copies of the original message. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.