Hello,

R version 2.4.1 (2006-12-18) 
i386-pc-mingw32

Calling serialize() with a NULL connection serializes it to a raw vector. 
However, when the object to be serialized is large, it takes a very long time:

> system.time( serialize(matrix(0, 1000, 1000), NULL) )
[1] 38.25 40.73 81.54    NA    NA

> system.time( serialize(matrix(0, 2000, 2000), NULL) )
[1]  609.72  664.75 1318.57      NA      NA

I was using this in Rmpi, where a clustered call returned a large matrix. 
However, serializing to a file or sockets is very fast for the very same matrix 
-- hence I wrote this function which runs much faster:

.mpi.quick.serialize <- function (object)
{
    fname <- tempfile("Rmpi")
    stream <- file(fname, "wb")
    on.exit({
        close(stream)
        file.remove(fname)
    })
    serialize(object, stream)
    close(stream)
    size <- file.info(fname)$size
    stream <- file(fname, "rb")
    return(readBin(stream, "raw", n = size))
}

> system.time( .mpi.quick.serialize(matrix(0, 1000, 1000) ) )
[1] 0.2500000000000000 0.0499999999999545 0.3000000000001819
[4]                 NA                 NA

> system.time( .mpi.quick.serialize(matrix(0, 2000, 2000) ) )
[1] 1.059999999999945 0.220000000000027 1.289999999999964
[4]                NA                NA

Does anyone have an idea why the performance difference is so 
large? Also, I was wondering if there is a better way -- the 
above solution feels like a quick fix rather than a correct
approach.

Regards,
ashish

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to