I'm running into an unexpected error using the glmnet and Matrix packages.

I have a matrix that is 8 million rows by 100 columns with 75% of the
entries being zero. When I run a vanilla glmnet logistic model on my server
with 300 GB of RAM, the task completes in 20 minutes:

> x # 8 million x 100 matrix
> model1 <- glmnet(x,y,'binomial',alpha=1) # run time 20 minutes

But if I convert the matrix to a sparse matrix using the Matrix package,
the model does not run at all:

> x2 <- Matrix(x,sparse=T) # 75% sparse
> model2 <- glmnet(x2,y,'binomial',alpha=1) # error
Error in array(0, c(n, p)) : 'dim' specifies too large an array

This result is the opposite of what I might have expected. The non-sparse
data runs fine, but the sparse data fails because it is "too large". Is
this a glmnet issue or an R memory issue? Is there a way to fix this in
glmnet?

--Nathan

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to