On Jan 17, 2012, at 4:50 PM, Thomas Lumley wrote:

> On Tue, Jan 17, 2012 at 9:11 PM, Matthew Dowle <mdo...@mdowle.plus.com> wrote:
>> Hi,
>> 
>> $ R --vanilla
>> R version 2.14.1 (2011-12-22)
>> Platform: i686-pc-linux-gnu (32-bit)
>>> DF = data.frame(a=1:3,b=4:6)
>>> DF
>>  a b
>> 1 1 4
>> 2 2 5
>> 3 3 6
>>> tracemem(DF)
>> [1] "<0x8898098>"
>>> names(DF)[2]="B"
>> tracemem[0x8898098 -> 0x8763e18]:
>> tracemem[0x8763e18 -> 0x8766be8]:
>> tracemem[0x8766be8 -> 0x8766b68]:
>>> DF
>>  a B
>> 1 1 4
>> 2 2 5
>> 3 3 6
>>> 
>> 
>> Are those 3 copies really taking place?
>> 
> 
> tracemem() isn't likely to give false positives.  Since you're on
> Linux, you could check by running under gdb and setting a breakpoint
> on memtrace_report, which is the function that prints the message.
> That would show where the duplicates are happening.
> 

My gut feeling is that it comes from the extra recursion caused by the subset 
assignment which needs DF to be dragged around deeper (I'm too lazy to actually 
check so it may be wrong). As expected you get less copying if you set the 
names directly:

> DF = data.frame(a=1:3,b=4:6)
> tracemem(DF)
[1] "<0x100c82628>"
> n = names(DF)
> n[2]="B"
> names(DF) = n
tracemem[0x100c82628 -> 0x100c82778]: 
tracemem[0x100c82778 -> 0x100c712b0]: 

and as we discussed here earlier, using the assignment primitive directly makes 
just one copy:

> DF = data.frame(a=1:3,b=4:6)
> tracemem(DF)
[1] "<0x1029a3c68>"
> n = names(DF)
> n[2]="B"
> DF = `names<-`(DF, n)
tracemem[0x1029a3c68 -> 0x1029a3b18]: 

Cheers,
Simon

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to