I'm running into an unexpected error using the glmnet and Matrix packages. I have a matrix that is 8 million rows by 100 columns with 75% of the entries being zero. When I run a vanilla glmnet logistic model on my server with 300 GB of RAM, the task completes in 20 minutes:
> x # 8 million x 100 matrix > model1 <- glmnet(x,y,'binomial',alpha=1) # run time 20 minutes But if I convert the matrix to a sparse matrix using the Matrix package, the model does not run at all: > x2 <- Matrix(x,sparse=T) # 75% sparse > model2 <- glmnet(x2,y,'binomial',alpha=1) # error Error in array(0, c(n, p)) : 'dim' specifies too large an array This result is the opposite of what I might have expected. The non-sparse data runs fine, but the sparse data fails because it is "too large". Is this a glmnet issue or an R memory issue? Is there a way to fix this in glmnet? --Nathan [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.