No anomaly, it is just that you need to know what it is for, before trying to
use it.
Basically, duplicated() works by looking up entries in a hash table (for which
there is a substantial literature, just google it). This will be somewhat more
efficient if you know the number of unique values
I'll go just a bit "fer-er." It appears the anomaly -- I hesitate to
call it a bug -- is in the C code for duplicated.default():
> duplicated(letters[1:10],nmax=10)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> duplicated(letters[1:10],nmax=9)
[1] FALSE FALSE FALSE FALSE FAL
Well, you won't like this, but it is kind of wimpily (is that a word?)
documented:
If you check the code of factor(), you will see that nmax appears as
an argument in a call to unique(). ?unique says for nmax, "... see
duplicated" . And ?duplicated says:
"If nmax is set too small there is liable
I have been trying to understand how the argument 'nmax' works in
'factor' function. R-Documentation states - "Since factors typically
have quite a small number of levels, for large vectors x it is helpful
to supply nmax as an upper bound on the number of unique values."
In the code below what is
4 matches
Mail list logo