In principle, yes (that's what Rserve serialization does), but AFAIR we don't have the infrastructure in place for that. But then you may as well serialize to a connection instead. To be honest I don't see why you would serialize anything big to a vector - you can't really do anything useful with that ... (what you couldn't do with the streaming version).
Sent from my iPhone > On Mar 17, 2015, at 17:48, Michael Lawrence <lawrence.mich...@gene.com> wrote: > > Presumably one could stream over the data twice, the first to get the size, > without storing the data. Slower but more memory efficient, unless I'm > missing something. > > Michael > >> On Tue, Mar 17, 2015 at 2:03 PM, Simon Urbanek <simon.urba...@r-project.org> >> wrote: >> Jorge, >> >> what you propose is not possible because the size of the output is unknown, >> that's why a dynamically growing PStream buffer is used - it cannot be >> pre-allocated. >> >> Cheers, >> Simon >> >> >> > On Mar 17, 2015, at 1:37 PM, Martinez de Salinas, Jorge >> > <jorge.martinez-de-sali...@hp.com> wrote: >> > >> > Hi, >> > >> > I've been doing some tests using serialize() to a raw vector: >> > >> > df <- data.frame(runif(50e6,1,10)) >> > ser <- serialize(df,NULL) >> > >> > In this example the data frame and the serialized raw vector occupy ~400MB >> > each, for a total of ~800M. However the memory peak during serialize() is >> > ~1.2GB: >> > >> > $ cat /proc/15155/status |grep Vm >> > ... >> > VmHWM: 1207792 kB >> > VmRSS: 817272 kB >> > >> > We work with very large data frames and in many cases this is killing R >> > with an "out of memory" error. >> > >> > This is the relevant code in R 3.1.3 in src/main/serialize.c:2494 >> > >> > InitMemOutPStream(&out, &mbs, type, version, hook, fun); >> > R_Serialize(object, &out); >> > val = CloseMemOutPStream(&out); >> > >> > The serialized object is being stored in a buffer pointed by out.data. >> > Then in CloseMemOutPStream() R copies the whole buffer to a newly >> > allocated SEXP object (the raw vector that stores the final result): >> > >> > PROTECT(val = allocVector(RAWSXP, mb->count)); >> > memcpy(RAW(val), mb->buf, mb->count); >> > free_mem_buffer(mb); >> > UNPROTECT(1); >> > >> > Before calling free_mem_buffer() the process is using ~1.2GB (the original >> > data frame + the serialization buffer + final serialized raw vector). >> > >> > One possible solution would be to allocate a buffer for the final raw >> > vector and store the serialization result directly into that buffer. This >> > would bring the memory peak down from ~1.2GB to ~800MB. >> > >> > Thanks, >> > -Jorge >> > >> > ______________________________________________ >> > R-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel