Dear all, I am using R 3.4.3 on Windows 10. I am writing code to use in a forthcoming teaching session. As part of the workshop the students are using breast cancer data made available by Patrick Royston and available from http://www.statapress.com/data/fpsaus.html (I didn't pick the dataset by the way). I would like the students to visualise linear, fractional polynomial and spline transformations of the "node" variable using a flexible parametric model with 3 knots for the baseline hazard. I can do this using the "predict" option within stpm2 as follows:
flex_nodes_lin <- stpm2(Surv(rfs/12,rfsi)~nodes, data=Practical_Rott_dev,df=3) haz_lin <- predict(flex_nodes_lin,type="hazard") flex_nodes_fp <- stpm2(Surv(rfs/12,rfsi)~log(nodes),data=Practical_Rott_dev,df=3) haz_fp <- predict(flex_nodes_fp,type="hazard") spline3 <- stpm2(Surv(rfs/12,rfsi)~1, data=Practical_Rott_dev,df=3) haz_spline3 <- predict(spline3,type="hazard") data_part9 <- data.frame(nodes,haz_lin[nodes],haz_spline3[nodes],haz_fp[nodes]) data_part9_m <- melt(data_part9,id.vars='nodes',factorsAsStrings=F) plot_part9 <- ggplot(data_part9_m,aes(nodes,value,colour=variable))+geom_line()+scale_colour_manual(labels=c("Linear","FP1","Spline 3 knots"),values=c("green","red","blue"))+theme_bw() plot_part9 + labs(x="Number of positive nodes",y="",color="") + theme(legend.position=c(0.8,0.8)) However, to my mind using "hazard" (or "survival") leads to a plot which do not help to understand the different functional form of "nodes". Therefore, I would prefer to do this using the linear predictor for each model instead. I've written the following code to do this: lp_nodes_lin <- flex_nodes_lin@lm$fitted.values lp_nodes_spline <- flex_nodes_spline@lm$fitted.values lp_nodes_fp <- flex_nodes_fp@lm$fitted.values data_part9 <- data.frame(flex_nodes_lin@lm$model$nodes,lp_nodes_lin,lp_nodes_spline,lp_nodes_fp) colnames(data_part9)[1] <- "nodes" data_part9_m <- melt(data_part9,id.vars='nodes') plot_part9 <- ggplot(data_part9_m,aes(nodes,value,colour=variable))+geom_line()+scale_colour_manual(labels=c("Linear","Spline (3 knots)", "FP1"),values=c("green","red","blue"))+theme_bw() plot_part9 + labs(x="Number of positive nodes",y="Prediction",color="") + theme(legend.position=c(0.8,0.8)) I have 2 concerns over this: 1. The plots are still not the shape I would expect them to be i.e. a line along the 45 degree line for the linear transformation, and a curve for each of the spline and FP transformations. 2. This code is really complicated - there must be an easier way?! Any help gratefully received! Kind regards, Laura P.S. If I was doing this in the logistic regression the code would be relatively simple: age_mod <- glm(DAY30~AGE,family="binomial") lp_age_lin <- predict(age_mod) agefp1_mod <- mfp(DAY30~fp(AGE,df=2,alpha=1),family="binomial") lp_agefp1 <- predict(agefp1_mod) age3_mod <- glm(DAY30~age3_spline,family="binomial") lp_age3 <- predict(age3_mod) data_part8 <- data.frame(AGE,lp_age_lin,lp_agefp1,lp_age3) data_part8_m <- melt(data_part8,id.vars='AGE') plot_part8 <- ggplot(data_part8_m,aes(AGE,value,colour=variable))+geom_line()+scale_colour_manual(labels=c("Linear","FP1","Spline 3 knots"),values=c("green","blue","red"))+theme_bw() plot_part8 + labs(x="Age (years)",y="Linear Predictor (log odds)",color="") + theme(legend.position=c(0.2,0.8)) [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.