CZ-

How exactly are you using the regression output?   Are you using the
regression parameter estimates?  Or, are you using the predicted (or fitted)
response variable.

The fact that your model matrix (sometimes called the X matrix) has
dimension 60 X 2000 means there is not a unique least squares solution for
the regression parameters.  There are least squares solutions for the
parameters---an infinite number of solutions to be exact.  So, if you are
using the regression parameters in the later stages of your analysis, the
fact that you got favorable results may simply mean you got lucky.  (SAS
PROC REG is simply giving you one of an infinite number of solutions.)

If you are using the predicted response variables, then there is no
less-than-full-rank issue.  You can proceed because predicted response
variables are not affected by less-than-full-rank model matrices.

I guess the point of my rambling is this: you need to be more specific about
what you want.  Do you want predicted response values?

Of course, all this may be moot.  Have you looked at

install.packages("subselect")
help(leaps,package="subselect")

?

-tgs

On Thu, Oct 7, 2010 at 1:52 PM, CZ <cxzh...@ualr.edu> wrote:

>
> Hi, Josh,
>
> What we are doing is, we have a microarray data set with 2000 genes and
> roughly 60 samples split 2:1 cancer:normal.  So we essentially have one
> binary response and 2000 continuous predictors. We want to use this to
> develop an ensemble-based classifier method in which the members of the
> ensemble are all gene pairs.  To this end, we want to use the Leaps and
> Bounds algorithm to obtain the K=200, 500, or 1000 best-performing subsets
> of Size=2 Genes to feed into our ensemble.  We had partial success doing
> this in SAS, as follows:
>
> 1.      the SAS Logistic Procedure (the natural choice for our binary
> outcome,
> because it does logistic regression) would include only the first 60 genes
> into the Leaps and Bounds search, and print for each of the remaining genes
> a message saying it was a linear combination of the first 60 genes & was
> therefore being excluded.
>
> 2.      However, the SAS Reg Procedure (not the natural choice for our
> binary
> outcome, because it does linear regression) would include all 2000 genes
> into the Leaps and Bounds search, and not be bothered by the linear
> dependencies.  And it gave results that held up quite well in subsequent
> analyses.
>
> So, first we want to replicate in R what we did in SAS with the linear
> regression, i.e., use the Leaps and Bounds algorithm to obtain the K=200,
> 500, or 1000 best-performing linear-regression models of Size=2 Genes from
> our list of 2000 genes, and not have it exclude genes for being a linear
> combination of the basis set.  Then we want to use R to try and do what SAS
> could not: get logistic regression to do the same thing and not have it
> exclude genes for being a linear combination of the basis set.
>
> Thanks.
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Does-R-have-function-package-works-similar-to-SAS-s-PROC-REG-tp2965657p2967295.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to