Re: [R] duplicated() with long vectors

Prof Brian Ripley Wed, 05 Dec 2012 14:24:51 -0800

On 05/12/2012 21:08, Sarah Goslee wrote:

Hi,


duplicated() doesn't just look at consecutive values, but anywhere in
the object. Since your 12320-element vector has only 48 separate
values, and all of them occur before the last 30 elements, so
duplicated() returns TRUE.

You might be looking for something involving rle(). What are you
trying to accomplish?

And BTW, 'long vector' is a technical term in R: not 12,000, but morethan 2 billion elements. You will hear it a lot more in the run-up tothe next 'minor' release of R (currently R-devel, maybe 2.16.0-to-be,which is the only version from which that quote comes that I am aware of).

The posting guide asked for 'at a minimum' information: if you are usingan unreleased development version of R you really must tell us (andshould not be reporting to the R-help list).


Sarah

On Wed, Dec 5, 2012 at 3:53 PM, Stephen Politzer-Ahles
<politzerahl...@gmail.com> wrote:

Hello,

duplicated() does not seem to work for a long vector. For example, if
you download the data from
https://docs.google.com/open?id=0B6-m45Jvl3ZmNmpaSlJWMXo5bmc (a vector
with about 12,000 numbers) and then run the following code which does
duplicated() over the whole vector but just shows the last 30
elements:

data.frame( tail(verylong, 30), tail(duplicated(verylong), 30) )

you'll see that at the end of the very long vector everything is
listed as a duplicate of the preceding element (even though it
shouldn't be). On the other hand, if you run the following code which
just takes out the last 30 elements of the vector and does duplicated
on them:

data.frame( tail(verylong, 30), duplicated(tail(verylong, 30)) )

you get the correct results (FALSE shows up wherever the value in the
first column changes). Does anyone know why this happens, and if
there's a fix? I notice the documentation for duplicated() says: "Long
vectors are supported for the default method of duplicated, but may
only be usable if nmax is supplied."  But I've tried running this with
a high value of nmax given, and it still gives me the same problem.

So far the only way I've figured out to get this duplicated()-like
vector is to use a for loop going through one item at a time, but that
takes about a minute to run.

Best,
Steve Politzer-Ahles



--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] duplicated() with long vectors

Reply via email to