On Dec 12, 2010, at 07:49 , Niels Richard Hansen wrote:

> Peter, thanks for looking into this. I know very little about the R
> implementation at the level you are talking about. For the record
> it is pretty easy to avoid the warning by checking and _not_
> doing the inefficient subsetting of an empty data frame ...
> 
> - Niels

It's not something that you'd be expected to know about, but this is r-devel 
and sometimes we think aloud, hoping that it rings a bell with some other 
reader...

I don't think it is the empty data frame per se that tickles the bug. Rather, 
it is an issue of having generated so much activity creating character 
constants that there are nontrivial hash-chains plus maybe the fact that you 
are using identical() to compare language-level objects. The attributes of the  
were 

(gdb) p Rf_PrintValue(ax)
<CHARSXP: "NA.43436">

(gdb) p Rf_PrintValue(ay)
<CHARSXP: "NA.64694">

and the (sub-) objects that were being compared at the time were

(gdb) p Rf_PrintValue(y)
<CHARSXP: "...">

(gdb) p Rf_PrintValue(x)
<CHARSXP: "a">

"a" and "..." are from argument lists of the functions that you are comparing, 
and I would assume that the "NA.43436" and "NA.64694" come from  rownames of 
the million-row data frame that you were creating.


(For the uninitiated: a hash table is used for by-name lookup. It works by 
computing a numerical "hash-index" based on the name, hoping to replace a 
linear search by a simple indexed lookup. If two or more names have the same 
hash index, a final linear search through a chained list of names is 
necessary.) 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to