Dear R-help - Thanks to those who replied yesterday (Christos H. and Thomas L.) regarding my question on coxph and model formula, the answers worked perfectly.
My new question involves the following. I want to run several coxph models (package survival) with the same dataset, but different subsets of that dataset. I have found a way to do this, described below in functions subwrap1 and subwrap2. These do not use the coxph "subset" argument, however, as you will see. My three main questions are : 1) When writing a wrapper like this, should I be using the subset argument in coxph(), or alternatively, doing what I am doing in subwrap1 and subwrap2 below? Is the subset argument in coxph more of a convenience tool for interactive use rather than programs? 2) If the approach in subwrap1 and subwrap2 is fine, is there a preference for using 'expressions' or 'strings'? Eventually, my program will create these subset conditions programmatically, so I think strings will be the way I have to go, even though I've seen warnings on this list about using the eval(parse()) construct. 3) Is there some approach to do this that I'm overlooking? My goal will be to produce a list of subset conditions (probably a character vector), and then use lapply to run the various cox regressions. I can already achieve my goal, I just would like to know more details about how others do things like this. I've simplified my code below to focus on where I feel I'm confused. Here is some code along with comments: #### BEGIN R SAMPLE CODE #Function for producing test data makeTestDF <- function(n) { times <- sample(1:200, n, replace = TRUE) event <- rbinom(n, 1, prob = .1) trt <- rep(c("A","B"), each = n/2) sex <- factor(c("M","F")) sex <- rep(sex, times = n/2) testdf <- data.frame(times,event,trt,sex) } # Make test data, n = 200 testdf <- makeTestDF(200) # Cox wrapper function with subset, this one works # Takes subset as expression subwrap1 <- function(x, sb) { sb <- eval(substitute(sb), x) x <- x[sb,] coxph(Surv(times,event)~trt, data = x) } subwrap1(testdf, sex == 'F') # This next one also works, but uses a character variable # instead of an expression as the subset argument subwrap2 <- function(x, sb) { sb <- eval(parse(text = sb), x) x <- x[sb,] coxph(Surv(times,event)~trt, data = x) } subwrap2(testdf, "sex == 'F'") # Neither of the above use the coxph subset argument # If I try using that, I get stuck with expressions, # I've tried many # different things in the subset argument, but none # seem to do the trick. Is using this argument in a # program even advisable? subwrap3 <- function(x, sb) { coxph(Surv(times,event)~trt, data = x, subset = eval(substitute(sb), x)) } subwrap3(testdf, sex == 'F') #does not work # Using a string, this works, however. subwrap4 <- function(x, sb) { coxph(Surv(times,event)~trt, data = x, subset = eval(parse(text=sb))) } subwrap4(testdf, "sex == 'F'") ### END R SAMPLE CODE Thanks so much, Erik Iverson [EMAIL PROTECTED] > sessionInfo() R version 2.5.1 (2007-06-27) i686-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] "grDevices" "datasets" "tcltk" "splines" "graphics" "utils" [7] "stats" "methods" "base" other attached packages: debug mvbutils SPLOTS_1.2-6 Hmisc chron survival "1.1.0" "1.1.1" "1.2-6" "3.4-2" "2.3-13" "2.32" ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.