Le 07/06/2021 à 09:00, Dario Strbenac a écrit :
Good day,

I notice that summing rows of a large dgTMatrix fails.

library(Matrix)
aMatrix <- new("dgTMatrix",
                 i = as.integer(sample(200000, 10000)-1), j = 
as.integer(sample(50000, 10000)-1), x = rnorm(10000),
                Dim = c(200000L, 50000L)
              )
totals <- rowSums(aMatrix == 0)  # Segmentation fault.

On my R v4.1 (Ubuntu 18), I don't have a segfault but I do have an error message:

Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'x' in selecting a method for function 'rowSums': cannot allocate vector of size 372.5 Gb

And the reason for this is quite clear: an intermediate logical matrix 'aMatrix == 0' is almost dense thus having 200000L*50000L - 10000L non zero entries. It is a little bit too much ;) for my modest laptop. So I can propose a workaround:

    totals <- 50000 - rowSums(aMatrix != 0)

Hoping it helps.

Best,
Serguei.


The server has 768 GB of RAM and it was never close to being consumed by this. 
Converting it to an ordinary matrix works fine.

big <- as.matrix(aMatrix)
totals <- rowSums(big == 0)      # Uses more RAM but there is no segmentation 
fault and result is returned.

May it be made more robust for dgTMatrix?

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to