Some developments with confusions. I tried the spline method and dummy variable approach to do it. But their results are very different. See following.
library(splines) #Spline methods-Method1 m.glm<-glm(mark~x_trans+poly(elevation_trans,2)+bs(distance_trans, degree=1,knots=c(13,25))+bs(y_trans,degree=1,knots=c(-0.45,-0.3 )),family=binomial(logit),data=point) library(Epi) ROC(test=m.glm$linear.predictors,stat=mark,plot='ROC',PV=FALSE,MX=FALSE,MI=FALSE,AUC=TRUE,col='red') #AUC=0.853 > summary(m.glm) Call: glm(formula = mark ~ x_trans + poly(elevation_trans, 2) + bs(distance_trans, degree = 1, knots = c(13, 25)) + bs(y_trans, degree = 1, knots = c(-0.45, -0.3)), family = binomial(logit), data = point) Deviance Residuals: Min 1Q Median 3Q Max -2.52197 -0.77223 -0.05298 0.73068 2.45377 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 8.7940 2.6398 3.331 0.000865 *** x_trans 4.1734 1.7192 2.428 0.015200 * poly(elevation_trans, 2)1 4.7058 4.2105 1.118 0.263719 poly(elevation_trans, 2)2 -6.9522 3.3176 -2.096 0.036124 * bs(distance_trans, degree = 1, knots = c(13, 25))1 0.3754 1.2402 0.303 0.762105 bs(distance_trans, degree = 1, knots = c(13, 25))2 -7.7295 1.6894 -4.575 4.75e-06 *** bs(distance_trans, degree = 1, knots = c(13, 25))3 2.8312 4.9072 0.577 0.563972 bs(y_trans, degree = 1, knots = c(-0.45, -0.3))1 -4.9294 1.4862 -3.317 0.000910 *** bs(y_trans, degree = 1, knots = c(-0.45, -0.3))2 -3.6187 1.5001 -2.412 0.015853 * bs(y_trans, degree = 1, knots = c(-0.45, -0.3))3 -9.2488 2.5567 -3.617 0.000298 *** --- #-Method2--Dummy variable approach: I use the dummy variable methods to fit the piecewise linear functions #========I generate the dummy variablse in SAS,because i didnot do it successfully in R for my poor skill=========# data gam; set a; if distance_trans<=13 then d_dum1=1; else d_dum1=0; if 13<distance_trans<25 then d_dum2=1; else d_dum2=0; if distance_trans>=25 then d_dum3=1; else d_dum3=0; if y_trans<=-0.45 then y_dum1=1; else y_dum1=0; if -0.45<y_trans<-0.3 then y_dum2=1; else y_dum2=0; if y_trans>=-0.3 then y_dum3=1; else y_dum3=0; ; run; #==============================================================================================================# point$d1<-d_dum1*distance_trans point$d2<-d_dum2*distance_trans point$d3<-d_dum3*distance_trans point$y1<-y_dum1*y_trans point$y2<-y_dum2*y_trans point$y3<-y_dum3*y_trans attach(point) m.glm <-glm(mark~x_trans+poly(elevation_trans,2)+d1+d2+d3+y1+y2+y3,family=binomial(logit),data=point) summary(m.glm) library(Epi) ROC(test=m.glm$linear.predictors,stat=mark,plot='ROC',PV=FALSE,MX=FALSE,MI=FALSE,AUC=TRUE,col='red') #AUC=0.820 Call: glm(formula = mark ~ x_trans + poly(elevation_trans, 2) + d1 + d2 + d3 + y1 + y2 + y3, family = binomial(logit), data = point) Deviance Residuals: Min 1Q Median 3Q Max -2.05273 -0.82265 -0.05687 0.94546 2.30962 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 4.5162 2.1511 2.099 0.035775 * x_trans 0.9174 1.4650 0.626 0.531197 poly(elevation_trans, 2)1 5.0001 4.2240 1.184 0.236517 poly(elevation_trans, 2)2 -6.8030 3.3235 -2.047 0.040663 * d1 -0.4993 0.1518 -3.289 0.001007 ** d2 -0.4263 0.1104 -3.860 0.000114 *** d3 -0.2275 0.1215 -1.873 0.061121 . y1 -5.2667 3.1645 -1.664 0.096051 . y2 -8.6002 4.4686 -1.925 0.054280 . y3 -12.2985 6.1452 -2.001 0.045360 * Q1: Why are these two methods so different for the results, e.g. the coefficients? Q2: The spline method is useful for piecewise linear functions, e.g. bs(distance_trans,degree=1,knots=c(13,25)), but how should i do if i want to fit a linear function for the case the distance_trans<13,and quadratic curve when distance_trans>=13? "bs(distance_trans,degree=c(1,2),knots=13)" cannot works. And even for more than three parts. <13,13~25, >25. Q3:"fit <- glm( y ~ pmax(x,20)+pmin(x,20), family=binomial)" is good. But if i divide x into three or more parts, how should i specify it in this way? Hope somone can help.Thanks a lot. On Jan 2, 2008 11:58 PM, Thomas Lumley <[EMAIL PROTECTED]> wrote: > On Tue, 1 Jan 2008, Charles C. Berry wrote: > > On Tue, 1 Jan 2008, zhijie zhang wrote: > > > >> Dear all, > >> I have two variables, y and x. It seems that the relationship between > them > >> is Piecewise Linear Functions. The cutpoint is 20. That is, when x<20, > there > >> is a linear relationship between y and x; while x>=20, there is another > >> different linear relationship between them. > >> How can i specify their relationships in R correctly? > >> # glm(y~I(x<20)+I(x>=20),family = binomial, data = point) something > like > >> this? > > > > Try this: > > > >> library(splines) > >> fit <- glm( y ~ bs( x, deg=1, knots=20 ), family=binomial) > > > > In the linear case I would actually argue that there is a benefit from > constructing the spline basis by hand, so that you know what the > coefficients mean. (For quadratic and higher order splines I agree that > pre-existing code for the B-spline basis makes a lot more sense). > > For example, in > fit <- glm( y ~ pmax(x,20)+pmin(x,20), family=binomial) > the coefficients are the slope when is < 20 and the slope when x>20. > > -thomas > > > Thomas Lumley Assoc. Professor, Biostatistics > [EMAIL PROTECTED] University of Washington, Seattle > -- With Kind Regards, oooO::::::::: (..)::::::::: :\.(:::Oooo:: ::\_)::(..):: :::::::)./::: ::::::(_/:::: ::::::::::::: [***********************************************************************] Zhi Jie,Zhang ,PHD Tel:+86-21-54237149 Dept. of Epidemiology,School of Public Health,Fudan University Address:No. 138 Yi Xue Yuan Road,Shanghai,China Postcode:200032 Email:[EMAIL PROTECTED] Website: www.statABC.com [***********************************************************************] oooO::::::::: (..)::::::::: :\.(:::Oooo:: ::\_)::(..):: :::::::)./::: ::::::(_/:::: ::::::::::::: [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.