Some developments with confusions. I tried the spline method and dummy
variable approach to do it. But their results are very different. See
following.
library(splines)
#Spline methods-Method1
m.glm<-glm(mark~x_trans+poly(elevation_trans,2)+bs(distance_trans,
degree=1,knots=c(13,25))+bs(y_trans,degree=1,knots=c(-0.45,-0.3
)),family=binomial(logit),data=point)
library(Epi)
ROC(test=m.glm$linear.predictors,stat=mark,plot='ROC',PV=FALSE,MX=FALSE,MI=FALSE,AUC=TRUE,col='red')
#AUC=0.853
> summary(m.glm)
Call:
glm(formula = mark ~ x_trans + poly(elevation_trans, 2) + bs(distance_trans,
degree = 1, knots = c(13, 25)) + bs(y_trans, degree = 1,
knots = c(-0.45, -0.3)), family = binomial(logit), data = point)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.52197 -0.77223 -0.05298 0.73068 2.45377
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.7940
2.6398 3.331 0.000865 ***
x_trans 4.1734
1.7192 2.428 0.015200 *
poly(elevation_trans, 2)1 4.7058
4.2105 1.118 0.263719
poly(elevation_trans, 2)2 -6.9522
3.3176 -2.096 0.036124 *
bs(distance_trans, degree = 1, knots = c(13, 25))1 0.3754 1.2402
0.303 0.762105
bs(distance_trans, degree = 1, knots = c(13, 25))2 -7.7295 1.6894
-4.575 4.75e-06 ***
bs(distance_trans, degree = 1, knots = c(13, 25))3 2.8312 4.9072
0.577 0.563972
bs(y_trans, degree = 1, knots = c(-0.45, -0.3))1 -4.9294 1.4862
-3.317 0.000910 ***
bs(y_trans, degree = 1, knots = c(-0.45, -0.3))2 -3.6187 1.5001
-2.412 0.015853 *
bs(y_trans, degree = 1, knots = c(-0.45, -0.3))3 -9.2488
2.5567
-3.617 0.000298 ***
---
#-Method2--Dummy variable approach: I use the dummy variable methods to fit
the piecewise linear functions
#========I generate the dummy variablse in SAS,because i didnot do it
successfully in R for my poor skill=========#
data gam;
set a;
if distance_trans<=13 then d_dum1=1; else d_dum1=0;
if 13<distance_trans<25 then d_dum2=1; else d_dum2=0;
if distance_trans>=25 then d_dum3=1; else d_dum3=0;
if y_trans<=-0.45 then y_dum1=1; else y_dum1=0;
if -0.45<y_trans<-0.3 then y_dum2=1; else y_dum2=0;
if y_trans>=-0.3 then y_dum3=1; else y_dum3=0;
;
run;
#==============================================================================================================#
point$d1<-d_dum1*distance_trans
point$d2<-d_dum2*distance_trans
point$d3<-d_dum3*distance_trans
point$y1<-y_dum1*y_trans
point$y2<-y_dum2*y_trans
point$y3<-y_dum3*y_trans
attach(point)
m.glm
<-glm(mark~x_trans+poly(elevation_trans,2)+d1+d2+d3+y1+y2+y3,family=binomial(logit),data=point)
summary(m.glm)
library(Epi)
ROC(test=m.glm$linear.predictors,stat=mark,plot='ROC',PV=FALSE,MX=FALSE,MI=FALSE,AUC=TRUE,col='red')
#AUC=0.820
Call:
glm(formula = mark ~ x_trans + poly(elevation_trans, 2) + d1 +
d2 + d3 + y1 + y2 + y3, family = binomial(logit), data = point)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.05273 -0.82265 -0.05687 0.94546 2.30962
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.5162 2.1511 2.099 0.035775 *
x_trans 0.9174 1.4650 0.626 0.531197
poly(elevation_trans, 2)1 5.0001 4.2240 1.184 0.236517
poly(elevation_trans, 2)2 -6.8030 3.3235 -2.047 0.040663 *
d1 -0.4993 0.1518 -3.289 0.001007 **
d2 -0.4263 0.1104 -3.860 0.000114 ***
d3 -0.2275 0.1215 -1.873 0.061121 .
y1 -5.2667 3.1645 -1.664 0.096051 .
y2 -8.6002 4.4686 -1.925 0.054280 .
y3 -12.2985 6.1452 -2.001 0.045360 *
Q1: Why are these two methods so different for the results, e.g. the
coefficients?
Q2: The spline method is useful for piecewise linear functions, e.g.
bs(distance_trans,degree=1,knots=c(13,25)),
but how should i do if i want to fit a linear function for the case the
distance_trans<13,and quadratic curve when distance_trans>=13?
"bs(distance_trans,degree=c(1,2),knots=13)" cannot works. And even for more
than three parts. <13,13~25, >25.
Q3:"fit <- glm( y ~ pmax(x,20)+pmin(x,20), family=binomial)" is good. But if
i divide x into three or more parts, how should i specify it in this way?
Hope somone can help.Thanks a lot.
On Jan 2, 2008 11:58 PM, Thomas Lumley <[EMAIL PROTECTED]> wrote:
> On Tue, 1 Jan 2008, Charles C. Berry wrote:
> > On Tue, 1 Jan 2008, zhijie zhang wrote:
> >
> >> Dear all,
> >> I have two variables, y and x. It seems that the relationship between
> them
> >> is Piecewise Linear Functions. The cutpoint is 20. That is, when x<20,
> there
> >> is a linear relationship between y and x; while x>=20, there is another
> >> different linear relationship between them.
> >> How can i specify their relationships in R correctly?
> >> # glm(y~I(x<20)+I(x>=20),family = binomial, data = point) something
> like
> >> this?
> >
> > Try this:
> >
> >> library(splines)
> >> fit <- glm( y ~ bs( x, deg=1, knots=20 ), family=binomial)
> >
>
> In the linear case I would actually argue that there is a benefit from
> constructing the spline basis by hand, so that you know what the
> coefficients mean. (For quadratic and higher order splines I agree that
> pre-existing code for the B-spline basis makes a lot more sense).
>
> For example, in
> fit <- glm( y ~ pmax(x,20)+pmin(x,20), family=binomial)
> the coefficients are the slope when is < 20 and the slope when x>20.
>
> -thomas
>
>
> Thomas Lumley Assoc. Professor, Biostatistics
> [EMAIL PROTECTED] University of Washington, Seattle
>
--
With Kind Regards,
oooO:::::::::
(..):::::::::
:\.(:::Oooo::
::\_)::(..)::
:::::::)./:::
::::::(_/::::
:::::::::::::
[***********************************************************************]
Zhi Jie,Zhang ,PHD
Tel:+86-21-54237149
Dept. of Epidemiology,School of Public Health,Fudan University
Address:No. 138 Yi Xue Yuan Road,Shanghai,China
Postcode:200032
Email:[EMAIL PROTECTED]
Website: www.statABC.com
[***********************************************************************]
oooO:::::::::
(..):::::::::
:\.(:::Oooo::
::\_)::(..)::
:::::::)./:::
::::::(_/::::
:::::::::::::
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.