On Thu, 9 May 2024, Sameh Abdulah wrote:
Hi,
I need to serialize and save a 20K x 20K matrix as a binary file. This process
is significantly slower in R compared to Python (4X slower).
I'm not sure about the best approach to optimize the below code. Is it possible
to parallelize the serialization function to enhance performance?
Parallelization should not help - a single CPU thread should be able to
saturate your disk or your network, assuming you have a typical computer.
The problem is possibly the conversion to text, writing it as binary
should be much faster.
To add to other suggestions, you might want to try my package "RMVL" -
aside from fast writes, it also gives you ability to share data between
ultimate users of the package.
best
Vladimir Dergachev
PS Example:
library("RMVL")
M<-mvl_open("test1.mvl", append=TRUE, create=TRUE)
n <- 20000^2
cat("Generating matrices ... ")
INI.TIME <- proc.time()
A <- matrix(runif(n), ncol = m)
END_GEN.TIME <- proc.time()
mvl_write(M, A, name="A")
mvl_close(M)
END_SER.TIME <- proc.time()
# Use in another script:
library("RMVL")
M2<-mvl_open("test1.mvl")
print(M2$A[1:10, 1:10])
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel