Re: [R] bug? in stats::cor for use=complete.obs with NAs

Peter Ehlers Thu, 10 Jun 2010 16:03:49 -0700

I don't think that this would be considered a bug. The
reason for the discrepancy between use="complete.obs"
and use="pairwise.complete.obs" for the case of the
Spearman correlation of two vectors x, y is this:


"pairwise" does complete.cases(x,y) and then ranks;
this is also what's done in cor.test().

"complete" ranks first (keeping NAs via the
na.last="keep" argument to rank()) and then does
complete.cases(ranked.x,ranked.y) on the ranked data.
This can obviously lead to a different set of
ranks being correlated than those for "pairwise".

I must admit that I wasn't aware that R does this
and I don't know the rationale for it. The help page
says:

   If use is "complete.obs" then missing values are
   handled by casewise deletion ...

which is not clear on the order of ranking and
deletion, but further down the page:

   Note that "spearman" basically computes cor(R(x), R(y))
   (or cov(.,.)) where R(u) := rank(u, na.last="keep").
   In the case of missing values, the ranks are calculated
   depending on the value of use, either based on complete
   observations, or based on pairwise completeness with
   reranking for each pair.

I guess that this implies that, for "complete", the ranking
occurs before the casewise deletion (else why the
na.last="keep"?).

If anyone knows the rationale and/or can give a reference,
I'd be glad to receive such.

  -Peter Ehlers


On 2010-06-09 11:36, hugh.ge...@thomsonreuters.com wrote:

Arrrrr,

I think I've found a bug in the behavior of the stats::cor function when
NAs are present, but in case I'm missing something, could you look over
this example and let me know what you think:

a = c(1,3,NA,1,2)
b = c(1,2,1,1,4)
cor(a,b,method="spearman", use="complete.obs")

[1] 0.8164966

cor(a,b,method="spearman", use="pairwise.complete.obs")

[1] 0.7777778

My understanding is that, when the inputs are vectors (but not
necessarily when they're matrices), the "complete.obs" and
"pairwise.complete.obs" arguments should give identical spearman
correlations.  The above example clearly shows they do not in my version
of R (2.11.1).  However, in cor.test, they do:

cor.test(a,b,method="spearman", use="complete.obs")


         Spearman's rank correlation rho

data:  a and b
S = 2.2222, p-value = 0.2222
alternative hypothesis: true rho is not equal to 0
sample estimates:
       rho
0.7777778


So cor and cor.test do not agree, which seems very likely to be a bug.
When calculating by hand, I also get 0.7777778.  Additionally, when
using an old version of R (2.5.0), both the complete.obs and
pairwise.complete.obs versions give 0.7777778.  Which strongly suggests
either 2.5.0 or 2.11.1 has a bug in it.  Is this a bug?  If so, has it
already been reported?  (I found a related but confusing email thread
from 2004 in the R archives, but I did not find the resolution to that
bug report).


Additional info:
Platform = Windows XP

sessionInfo()

R version 2.11.1 (2010-05-31)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United
States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


loaded via a namespace (and not attached):
[1] tools_2.11.1

Sys.getlocale()

[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Thanks,

--Hugh


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bug? in stats::cor for use=complete.obs with NAs

Reply via email to