I'm sure this has gotten some attention before, but I have two CSV files generated from vmstat and free that are roughly 6-8 Mb (about 80,000 lines) each. When I try to use read.csv(), R allocates all available memory (about 4.9 Gb) when loading the files, which is over 300 times the size of the raw data. Here are the scripts used to generate the CSV files as well as the R code:
Scripts (run for roughly a 24-hour period): vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS=" "; OFS=","; print strftime("%F %T %Z"),$6,$7,$12,$13,$14,$15,$16,$17;}' >> ~/vmstat_20100118_133845.o; free -ms 1 | awk '$0 ~ /Mem\:/ {FS=" "; OFS=","; print strftime("%F %T %Z"),$2,$3,$4,$5,$6,$7}' >> ~/memfree_20100118_140845.o; R code: infile.vms <- "~/vmstat_20100118_133845.o"; infile.mem <- "~/memfree_20100118_140845.o"; vms.colnames <- c("time","r","b","swpd","free","inact","active","si","so","bi","bo","in","cs","us","sy","id","wa","st"); vms.colclass <- c("character",rep("integer",length(vms.colnames)-1)); mem.colnames <- c("time","total","used","free","shared","buffers","cached"); mem.colclass <- c("character",rep("integer",length(mem.colnames)-1)); vmsdf <- (read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames)); memdf <- (read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames)); I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There are no other significant programs running and `rm()` followed by ` gc()` successfully frees the memory (followed by swapins after other programs seek to used previously cached information swapped to disk). I've incorporated the memory-saving suggestions in the `read.csv()` manual page, excluding the limit on the lines read (which shouldn't really be necessary here since we're only talking about < 20 Mb of raw data. Any suggestions, or is the read.csv() code known to have memory leak/ overcommit issues? Thanks ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.