Hi, I am trying to read in a rather large list of transactions using the arules library. It seems in the coerce method into the dgCmatrix, it somewhere calls unique. Unique.c throws an error when n > 536870912; however, when 4*n was modified to 2*n in 2004, the overflow protection should have changed from 2^29 to 2^30, right? If so, how would I change it in my copy? Do I have to recompile everything?
Thanks, Patrick McCann Here is a simple to reproduce example: > runif(2^29+5)->a > sum(unique(a))->b Error in unique.default(a) : length 536870917 is too large for hashing > traceback() 3: unique.default(a) 2: unique(a) 1: unique(a) > unique.default function (x, incomparables = FALSE, fromLast = FALSE, ...) { z <- .Internal(unique(x, incomparables, fromLast)) if (is.factor(x)) factor(z, levels = seq_len(nlevels(x)), labels = levels(x), ordered = is.ordered(x)) else if (inherits(x, "POSIXct")) structure(z, class = class(x), tzone = attr(x, "tzone")) else if (inherits(x, "Date")) structure(z, class = class(x)) else z } <environment: namespace:base> >From http://svn.r-project.org/R/trunk/src/main/unique.c I see: /* Choose M to be the smallest power of 2 not less than 2*n and set K = log2(M). Need K >= 1 and hence M >= 2, and 2^M <= 2^31 -1, hence n <= 2^29. Dec 2004: modified from 4*n to 2*n, since in the worst case we have a 50% full table, and that is still rather efficient -- see R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606. */ static void MKsetup(int n, HashData *d) { int n4 = 2 * n; if(n < 0 || n > 536870912) /* protect against overflow to -ve */ error(_("length %d is too large for hashing"), n); d->M = 2; d->K = 1; while (d->M < n4) { d->M *= 2; d->K += 1; } } ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.