Hi, I am developing a tool for converting a large data frame stored in an uncompressed binary (XDR) RData file to a delimited text file. The data frame is too large to load() and extract rows from on a typical PC. I'm looking to parse through the file and extract individual entries without loading the whole thing into memory.
In terms of some C source functions, instead of doing RestoreToEnv(R_Unserialize(connection)) which is essentially what load() does, I'm looking to get the documentation I would need to build a function "SaveToCSV()" so that I could do SaveToCSV(R_Unserialize(connection)). Where can I get documentation on the RData file format? Does a spec document exist? See details below. Thanks, Ian Ian Cook | Advanced Micro Devices, Inc. | [EMAIL PROTECTED] ------------------------- Additional details: I've browsed through the relevant source code (saveload.c, serialize.c) for ideas. Here's a demo of the problem I'm looking to solve: # create a sample data frame ds <- data.frame(row1=c(1,2,3),row2=c('a','b','c')) # save into an uncompressed binary R dataset save(ds,file="ds.rdata",compress=FALSE) rm(ds) # Then load() can be simulated like this: # create and open a file connection con <- file("ds.rdata",open="rb") # read the first 5 characters readChar(con,5) # unserialize the remainder and restore to the environment ds <- unserialize(con,NULL)[["ds"]] close(con) But this takes up too much memory if the data set is too big. I can read in the file character-by-character, i.e. using readChar(), but it's obvious that the file format is not trivial. readChar(con,10000) for this demo yields: [EMAIL PROTECTED]@\b\0\0\0\0\0\0\0\0\003\r\0\0\0\003\0\0\0\001\0\0\0\002\0\0\0\003\0\0\004\002\0\0\0\001\0\0\020\t\0\0\0\006levels\0\0\0\020\0\0\0\003\0\0\0\t\0\0\0\001a\0\0\0\t\0\0\0\001b\0\0\0\t\0\0\0\001c\0\0\004\002\0\0\0\001\0\0\020\t\0\0\0\005class\0\0\0\020\0\0\0\001\0\0\0\t\0\0\0\006factor\0\0\0þ\0\0\004\002\0\0\0\001\0\0\020\t\0\0\0\005names\0\0\0\020\0\0\0\002\0\0\0\t\0\0\0\004row1\0\0\0\t\0\0\0\004row2\0\0\004\002\0\0\0\001\0\0\020\t\0\0\0\trow.names\0\0\0\r\0\0\0\002€\0\0\0\0\0\0\003\0\0\004\002\0\0\003ÿ\0\0\0\020\0\0\0\001\0\0\0\t\0\0\0\ndata.frame\0\0\0þ\0\0\0þ This would be parse-able if I had a file spec. Thanks. Ian Cook | Advanced Micro Devices, Inc. | [EMAIL PROTECTED] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel