On Mon, 17 Mar 2014, Duncan Murdoch wrote:

On 14-03-17 6:22 PM, Mike Miller wrote:

Thanks!  Another thing I've figured out:  Use of "drop0trailing=T" in
format() fixes the .00000 stuff that I didn't like:

write.table(format(data[1:10,], digits=5, trim=T, drop0trailing=T), 
row.names=F, col.names=F, quote=F)
[snip]

I still have more to figure out, but for most smaller table-writing jobs, I think something like the last command above will be my usual approach. In real life, I would use a tab delimiter, though.

I'm still unsure about the best way for dealing with very large data frames, though. There's probably a good way to stream data into a file so that it doesn't have to be written as an additional large object in memory. There must be a way to make a connection and then just pipe the formatted data into it. Maybe something related to sprintf() will work.

You've never explained why you want to write these gigantic text files. Text is a lossy way to store numbers: it takes 15 bytes to store about 8 bytes of information, and you'll probably lose a few bits at the end. Why not write your files in binary, storing exactly what you have in memory? It'll be a lot faster to write and to read, you won't need to duplicated before writing, etc.


Thanks for asking, Duncan. A typical problem is that I am running 12 processes at once on a 12-core machine with 32 GB of RAM, so each process has to be limited to about 2.5 GB total. Then I try to load as much data as I can within that limitation. The output data does not always need to be in text format, but it usually does because it has to be read by other programs.

I was hoping I could read a line from a data frame and format it like this:

sprintf(c(rep("%s",2), rep("%d",2), rep("%.4f",4)), data[1,1:8])

But sprintf reads vectors, so they have to be of a single type.

Thanks for your help.

Mike

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to