On Jan 12, 2010, at 2:13 AM, Benedikt Drosse wrote:
Dear R-Users,
I have a question concerning the determination of breakpoints and
comparison of slopes from broken-line regression models. Although
this is rather a standard problem in data analysis, all information
I gathered so far, did not answer my questions.
I added a subsetted example of my data. Basically it is a timeseries
of recorded phenotypes in three different groups of plants.
You will see in the plot that the phenotype between groups behaves
differently over time. Plotted data of Group 3 look like a nice
linear relationship between phenotype (y) and time (x). Group 1
seems to have 1 breakpoint (at x=31) and Group 3 has 2 breakpoints
(at x=c(14,31)) in the data.
1. I am very interested in finding the breakpoints (x-values), where
the slope of a regression line changes. In the plot I estimated the
break-points by eye, subsetted the data and performed linear
regression on the subsets for each group. However, this is not the
way I want to go, since I would like to have some reasonable
criteria (better than my eye) to determine breakpoints.
2. Additionally I am interested in testing, whether the slopes of
(partial-)regression lines between groups are different from each
other.
Considering my questions, the 'segmented' package should exactly fit
my needs. The problem I have is, that on the one hand I would also
like to compare slopes of regression lines, which do not have
breakpoints (e.g group 3 of the example) to lines, which have
breakpoints (group 1 and 2). This seems to be not possible with the
'segmented' package.
On the other hand the 'segmented' package seems to have a poor error-
reporting, so it is very hard for me to figure out, what is going
wrong in the analysis (see example).
Can you think of any other way/procedure/package I could use to
analyze my data? Hopefully you can also give me a clue on how to
proceed with the analysis, when the error of the segmented()-
function in the creation of the covariance-matrix occurs.
I would be grateful for any comment on how to determine breakpoints
and compare slopes of regression lines in a different way than
subsetting the data by hand and regress the subsetted parts as I did
it for the example plot. Thanks to anyone, who has an idea, already.
Sorry for the long example code.
Have you looked at the capabilities of the 'strucchange' package?
http://cran.r-project.org/web/packages/strucchange/index.html
--
David.
Best regards,
Benedikt
#Start of Code
library(car)
library(segmented)
#Data file:
#Timescale:
x <-
c(0,0,0,3,3,3,3,3,3,6,6,6,9,9,9,14,14,14,17,17,17,20,20,20,24,24,24,28,28,28,31,31,31,34,34,34,38,38,38,45,45,45,0,0,0,3,3,3,3,3,3,6,6,6,9,9,9,14,14,14,17,17,17,20,20,20,24,24,24,28,28,28,31,31,31,34,34,34,38,38,38,45,45,45,0,0,0,3,3,3,7,7,7,9,9,9,13,13,13,13,13,16,16,16,20,20,20,24,24,24,28,28,28,34,34,34,41,41,41,50,50,50)
#y-data for three groups (t1, t2, t3):
t1 <-
c
(41,35,38,43,47,46,50,41,47,74,61,58,72,87,84,89,114,93,113,99,105,106,98,200,170,145,153,187,202,195,198,177,195,288,274,233,320,348,428,545,520,549
)
t2 <-
c
(44,40,38,57,42,47,42,57,40,71,75,64,70,83,88,115,95,106,132,145,115,192,172,190,273,300,250,403,392,343,400,402,444
,NA,NA,NA,NA,NA,NA,430,430,459)
t3 <-
c
(41,35,38,39,43,47,58,33,45,67,60,70,64,78,78,67,62,78,79,71,93,86,82,105,96,102,131,126,124,112,100,133,170,161,163,154,178,185
)
group <- as.factor(c(rep(1, times=length(t1)), rep(2,
times=length(t2)), rep(3, times=length(t3))))
data <- data.frame(group, x=x, y=c(t1,t2,t3))
#Groupwise plot of datapoints vs time:
scatterplot(y~x*group, data=data, smooth=FALSE, reg.line=FALSE,
legend.coords="topleft")
#Regressions for group 1 with 1 breakpoints (red, dashed line)
reg_1.1 <- lm(data[which(data$x<=30 & data$group==1),
3]~data[which(data$x<=30 & data$group==1),2])
abline(reg_1.1, col="black")
reg_1.2 <- lm(data[which(data$x>=30 & data$group==1),
3]~data[which(data$x>=30 & data$group==1),2])
abline(reg_1.2, col="black")
#Regressions for group 2 with 2 breakpoints (black, solid line)
reg_2.1 <- lm(data[which(data$x<=14 & data$group==2),
3]~data[which(data$x<=14 & data$group==2),2])
abline(reg_2.1, col="red", lty=2)
reg_2.2 <- lm(data[which(data$x>=14 & data$x<45 & data$group==2),
3]~data[which(data$x>=14 & data$x<45 & data$group==2),2])
abline(reg_2.2, col="red", lty=2)
reg_2.3 <- lm(data[which(data$x>=31 & data$x<=45 & data
$group==2),3]~data[which(data$x>=31 & data$x<=45 & data$group==2),2])
abline(reg_2.3, col="red", lty=2)
#Regression for group 3 without breakpoints (green, solid line)
reg_3.1 <- lm(data[which(data$group==3),3]~data[which(data
$group==3),2])
abline(reg_3.1, col="green", lty=1)
#Segmented package to estimate breakpoints and determine slopes for
group2 and 3:
subset_data <- subset(data, subset=data$group!=3)
names(subset_data) <- c("sub_group", "sub_x", "sub_y")
attach(subset_data)
X <- model.matrix(~0+sub_group)*sub_x
time.1 <- X[,1]
time.2 <- X[,2]
time.3 <- X[,3]
olm <- lm(sub_y~0 + sub_group + time.1 + time.2)
os <- segmented(olm, seg.Z= ~ time.1 + time.2, psi=list(time.
1=c(14,30), time.2=40))
#End of code
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.