On 11/02/2015 4:30 PM, Goldschneider, Jill wrote:
> I was playing with some examples of piecewise regression using lm() and have 
> come across a behavior I'm uncertain about.
> Below is simple 3-segment dataset.  I compare predicted output of a model 
> created by one call to lm() to that of 3 models created by 3 calls to lm().
> In case A and B, the results are the same.  However, in case C the results 
> differ for the middle segment.
> Is the output of lm() for case C to be expected or not and if so, why?

Take a look at the fit value, and you'll see what happened:

> lm(y ~ (x<=10)*x + I(x>10 & x<=20) + (x>20)*x)

Call:
lm(formula = y ~ (x <= 10) * x + I(x > 10 & x <= 20) + (x > 20) *
    x)

Coefficients:
            (Intercept)              x <= 10TRUE
x  I(x > 10 & x <= 20)TRUE
                35.0741                 -15.0733
-0.1644                 -17.7344
             x > 20TRUE            x <= 10TRUE:x             x:x > 20TRUE
                     NA                   1.2397                  -0.9658

The * in a formula means "main effect plus interaction", not
multiplication.  W      hat you want is


lm(y ~ I((x<=10)*x) + I(x>10 & x<=20) + I((x>20)*x))

This doesn't give exactly the same results as the segmented regression,
because it uses the same intercept in all three segments; you might want
to add I(x <= 10) as well if you don't want that.

Duncan Murdoch

> Thank you,
> Jill
> 
> set.seed(133)
> y <- c(21:30, rep(15,10), 10:1) + runif(30, -2, 2)
> x <- 1:30
> 
> # Case A: 3 segments, each fit with a constant value
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~1)), predict(lm(y[11:20]~1)), 
> predict(lm(y[21:30]~1)))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ I(x<=10) + I(x>10 & x<=20) + I(x>20)))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
> 
> # Case B: 3 segments, each fit with a line
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~x[11:20])), 
> predict(lm(y[21:30]~x[21:30])))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ (x<=10)*x + (x>10 & x<=20)*x + (x>20)*x))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
> 
> # Cast C: 3 segments - the middle fit with a constant value, the outer by a 
> line
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~1)), 
> predict(lm(y[21:30]~x[21:30])))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ (x<=10)*x + I(x>10 & x<=20) + (x>20)*x))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
> ## the single call to lm does not have a constant value fit to the middle 
> section
> plot(x, p.c3-p.lm1 )
> 
> 
> 
> The information contained in this transmission may contain confidential 
> information.  If the reader of this message is not the intended recipient, 
> you are hereby notified that any review, dissemination, distribution or 
> duplication of this communication is strictly prohibited.  If you are not the 
> intended recipient, please contact the sender by reply email and destroy all 
> copies of the original message.
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to