On Aug 28, 2013, at 12:17 PM, Hadley Wickham wrote: > Hi all, > > I've been trying to learn more about memory profiling in R and I've > been trying memory profiling out on read.table. I'm getting a bit of a > strange result, and I hope that someone might be able to explain why. > > After running > > Rprof("read-table.prof", memory.profiling = TRUE, line.profiling = TRUE, > gc.profiling = TRUE, interval = interval) > diamonds <- read.table("diamonds.csv", sep = ",", header = TRUE) > Rprof(NULL) > > and doing an lot of data manipulation, I end up with a table that > displays the total memory (in megabytes) allocated and released (by > gc) from each line of (a local copy of) read.table: > > file line alloc release > 1 read-table.r 122 1.9797 1.1435 > 2 read-table.r 165 1.1148 0.6511 > 3 read-table.r 221 0.0763 0.0321 > 4 read-table.r 222 0.4922 1.5057 > > Lines 122 and 165 are where I expect to see big allocations and > releases - they're calling scan and convert.type respectively. Lines > 221 and 222 are more of a mystery: > > class(data) <- "data.frame" > attr(data, "row.names") <- row.names > > Why do those lines need any allocations? I thought class<- and attr<- > were primitives, and hence would modify in place. >
.. but only if there is no other reference to the data (i.e. NAMED < 2). If there are two references, they have to copy, because it would change the other copy. Here, however, it already has NAMED=2 because of data <- data[keep] If you remove that line and inverse the order of class() and attr()<- then you get 0 copies. Cheers, Simon PS: if you are loading any sizable data, the one thing you don't want to do is to use read.table() ;) > Re-running with gctorture(TRUE) yields roughly similar numbers, > although there is no memory release because gc is called earlier, and > the assignment of allocations to line is probably more accurate given > that gctorture runs the code about 20x slower: > > file line alloc release > 25 read-table.r 221 0.387299 0.00e+00 > 26 read-table.r 222 0.362964 0.00e+00 > > The whole object, when loaded, is ~4 meg, so those allocations > represent fairly sizeable chunks of the total. > > Any suggestions would be greatly appreciated. Thanks! > > Hadley > > -- > Chief Scientist, RStudio > http://had.co.nz/ > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel