On 06/08/2010 05:29 AM, Mark Seeto wrote:
On 06/06/2010 10:49 PM, Mark Seeto wrote:
Hello,
I have a couple of questions about the ols function in Frank Harrell's
rms
package.
Is there any way to specify variables by their column number in the data
frame rather than by the variable name?
For example,
library(rms)
x1<- rnorm(100, 0, 1)
x2<- rnorm(100, 0, 1)
x3<- rnorm(100, 0, 1)
y<- x2 + x3 + rnorm(100, 0, 5)
d<- data.frame(x1, x2, x3, y)
rm(x1, x2, x3, y)
lm(y ~ d[,2] + d[,3], data = d) # This works
ols(y ~ d[,2] + d[,3], data = d) # Gives error
Error in if (!length(fname) || !any(fname == zname)) { :
missing value where TRUE/FALSE needed
However, this works:
ols(y ~ x2 + d[,3], data = d)
The reason I want to do this is to program variable selection for
bootstrap model validation.
A related question: does ols allow "y ~ ." notation?
lm(y ~ ., data = d[, 2:4]) # This works
ols(y ~ ., data = d[, 2:4]) # Gives error
Error in terms.formula(formula) : '.' in formula and no 'data' argument
Thanks for any help you can give.
Regards,
Mark
Hi Mark,
It appears that you answered the questions yourself. rms wants real
variables or transformations of them. It makes certain assumptions
about names of terms. The y ~ . should work though; sometime I'll have
a look at that.
But these are the small questions compared to what you really want. Why
do you need variable selection, i.e., what is wrong with having
insignificant variables in a model? If you indeed need variable
selection see if backwards stepdown works for you. It is built-in to
rms bootstrap validation and calibration functions.
Frank
Thank you for your reply, Frank. I would have reached the conclusion
that rms only accepts real variables had this not worked:
ols(y ~ x2 + d[,3], data = d)
Hi Mark - that probably worked by accident.
The reason I want to program variable selection is so that I can use the
bootstrap to check the performance of a model-selection method. My
co-workers and I have used a variable selection method which combines
forward selection, backward elimination, and best subsets (the forward and
backward methods were run using different software).
I want to do bootstrap validation to (1) check the over-optimism in R^2,
and (2) justify using a different approach, if R^2 turns out to be very
over-optimistic. The different approach would probably be data reduction
using variable clustering, as you describe in your book.
The validate.ols function which calls the predab.resample function may
give you some code to start with. Note however that the performance of
the approach you are suggestion has already been shown to be poor in
many cases. You might run the following in parallel: full model fits
and penalized least squares using penalties selected by AIC (using
special arguments to ols along with the pentrace function).
Frank
Regards,
Mark
--
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.