On Fri, Aug 5, 2011 at 8:45 AM, Martin Maechler <maech...@stat.math.ethz.ch> wrote: > Note the following: As soon as you use "categorical predictors", > i.e., factors, and particularly when these have many levels (instead of just > being binary), the resulting model matrix is often sparse, > i.e. contains many zeros. > When the matrix is ``really sparse',say, > #{zeros} / #{non-zeros} >= 10 > it can pay much to use the sparse matrices that the 'Matrix' > package provides (you have 'Matrix' as part of your R > installation). > > For exactly this reason, 'glmnet' > has supported the use of sparse matrices for a long time, > and we have provided the convenience function > sparse.model.matrix() {package 'Matrix'} > for easy construction of such matrices. > > There's also a very small extension package 'MatrixModels' > which goes one step further, with its function > model.Matrix(..... sparse = TRUE/FALSE) > but you would not need that for using the sparseMatrix in > 'glmnet'.
Thanks, Martin. In my case, the number of potential predictors is high and many of them are factors with 5 categories. With sparse.model.matrix(), I am getting the following error : «Error: C stack usage is too close to the limit.» I realize that my sparse matrix is huge -- and the error given by sparse.model.matrix() perfectly justified --, but I wonder whether this problem can be overcome by having sparse.model.matrix() using dynamic memory instead of static one. Paul ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.