On Wed, 13 Jan 2010, Benjamin Tyner wrote:

The MKsetup() in unique.c throws an error if the vector to be hashed is longer than (2^32)/8:

  if(n < 0 || n > 536870912) /* protect against overflow to -ve */
      error(_("length %d is too large for hashing"), n);

I occasionally work with vectors longer than this on 64-bit builds. Would it be too much to ask that R can take advantage of all 64 bits for hashing when compiled as such?

'All 64 bits' of what? All systems we use have 64 bit integer types, but there are good reasons not to use them where not needed, and 'int' is not 64-bit on any R platform. I don't see the connection to 64-bit pointers, which is what is most often meant by a '64-bit build'.

Efficiency would be a major consideration with such long vectors. What type(s) are you contemplating, and are they full of duplicates? If the latter, we could simply allow K=29. Otherwise likely a new approach would be needed.

I think the way forward is for you to do some experiments and submit proposed code changes with supporting evidence. (It seems only you is interested.)

--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to