Le 07/06/2021 à 09:00, Dario Strbenac a écrit :
Good day,
I notice that summing rows of a large dgTMatrix fails.
library(Matrix)
aMatrix <- new("dgTMatrix",
i = as.integer(sample(200000, 10000)-1), j =
as.integer(sample(50000, 10000)-1), x = rnorm(10000),
Dim = c(200000L, 50000L)
)
totals <- rowSums(aMatrix == 0) # Segmentation fault.
On my R v4.1 (Ubuntu 18), I don't have a segfault but I do have an error
message:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for
function 'rowSums': cannot allocate vector of size 372.5 Gb
And the reason for this is quite clear: an intermediate logical matrix
'aMatrix == 0' is almost dense thus having 200000L*50000L - 10000L non
zero entries. It is a little bit too much ;) for my modest laptop. So I
can propose a workaround:
totals <- 50000 - rowSums(aMatrix != 0)
Hoping it helps.
Best,
Serguei.
The server has 768 GB of RAM and it was never close to being consumed by this.
Converting it to an ordinary matrix works fine.
big <- as.matrix(aMatrix)
totals <- rowSums(big == 0) # Uses more RAM but there is no segmentation
fault and result is returned.
May it be made more robust for dgTMatrix?
--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel