>>>>> Martin Maechler <maech...@stat.math.ethz.ch> >>>>> on Tue, 10 May 2016 16:08:39 +0200 writes:
> This is an RFC / announcement related to the 2nd part of PR#16885 > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885 > about complex NA's. > The (somewhat rare) incompatibility in R's 3.3.0 match() behavior for the > case of complex numbers with NA & NaN's {which has been fixed for R 3.3.0 > patched in the mean time} triggered some more comprehensive "research". > I found that we have had a long-standing inconsistency at least between the > documented and the real behavior. I am claiming that the documented > behavior is desirable and hence R's current "real" behavior is bugous, and > I am proposing to change it, in R-devel (to be 3.4.0) for now. After the "roaring unanimous" assent (one private msg encouraging me to go forward, no dissenting voice, hence an "odds ratio" of +Inf in favor ;-) I have now committed my proposal to R-devel (svn rev. 70597) and some of us will be seeing the effect in package space within a day or so, in the CRAN checks against R-devel (not for bioconductor AFAIK; their checks using R-devel only when it less than ca 6 months from release). It's still worthwhile to discuss the issue, if you come late to it, notably as ---paraphrasing Dirk on the R-package-devel list--- the release of 3.4.0 is almost a year away, and so now is the best time to tinker with the API, in other words, consider breaking rarely used legacy APIs.. Martin > In help(match) we have been saying > | Exactly what matches what is to some extent a matter of definition. > | For all types, \code{NA} matches \code{NA} and no other value. > | For real and complex values, \code{NaN} values are regarded > | as matching any other \code{NaN} value, but not matching \code{NA}. > for at least 10 years. But we don't do that at all in the > complex case (and AFAIK never got a bug report about it). > Also, e.g., print(.) or format(.) do simply use "NA" for all > the different complex NA-containing numbers, where OTOH, > non-NA NaN's { <=> !is.nan(z) & is.na(z) } > in format() or print() do show the NaN in real and/or imaginary > parts; for an example, look at the "format" column of the matrix > below, after 'print(cbind' ... > The current match()---and duplicated(), unique() which are based on the same > C code---*do* distinguish almost all complex NA / NaN's which is > NOT according to documentation. I have found that this is just because of > of our hashing function for the complex case, chash() in R/src/main/unique.c, > is bogous in the sense that it is not compatible with the above documentation > and also not with the cequal() function (in the same file uniqu.c) for checking > equality of complex numbers. > As I have found,, a *simplified* version of the chash() function > to make it compatible with cequal() does solve all the problems I've > indicated, and the current plan is to commit that change --- after some > discussion time, here on R-devel --- to the code base. > My change passes 'make check-all' fine, but I'm 100% sure that there will > be effects in package-space. ... one reason for this posting. > As mentioned above, note that the chash() function has been in > use for all three functions > match() > duplicated() > unique() > and the change will affect all three --- but just for the case of complex > vectors with NA or NaN's. > To show more, a small R session -- using my version of R-devel > == the proposition: > The R script ('complex-NA-short.R') for (a bit more than) the > session is attached {{you can attach text/plain easily}}: >> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0) >> ## --- = NA_real_ but that does not exist e.g., in R 2.3.1 >> ## similarly, '1L', '2L', .. do not exist e.g., in R 2.3.1 >> (z <- z[is.na(z)]) > [1] NA NaN+ 0i NA NaN+ 1i NA NA NA NA > [9] 0+NaNi 1+NaNi NA NaN+NaNi >> outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ? > + r <- matrix( , length(x), length(y)) > + for(i in seq(along=x)) > + for(j in seq(along=y)) > + r[i,j] <- identical(z[i], z[j], ...) > + r > + } >> ## Very strictly - in the sense of identical() -- these 12 complex numbers all differ: >> ## a version that works in older versions of R, where identical() had fewer arguments! >> outerID.picky <- function(x,y) { > + nF <- length(formals(identical)) - 2 > + do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF)))) > + } >> oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 is a wild guess >> symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R] > [1,] | . . . . . . . . . . . > [2,] . | . . . . . . . . . . > [3,] . . | . . . . . . . . . > [4,] . . . | . . . . . . . . > [5,] . . . . | . . . . . . . > [6,] . . . . . | . . . . . . > [7,] . . . . . . | . . . . . > [8,] . . . . . . . | . . . . > [9,] . . . . . . . . | . . . > [10,] . . . . . . . . . | . . > [11,] . . . . . . . . . . | . > [12,] . . . . . . . . . . . | >> try(# for older R versions > + stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 1)) > + ) >> (mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in print()/format() _FIXME_ > [1] 1 2 1 2 1 1 1 1 2 2 1 2 >> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern : >> print(cbind(format = format(z), t(zRI), mz), quote=FALSE) > format Re Im mz > [1,] NA <NA> 0 1 > [2,] NaN+ 0i NaN 0 2 > [3,] NA <NA> 1 1 > [4,] NaN+ 1i NaN 1 2 > [5,] NA 0 <NA> 1 > [6,] NA 1 <NA> 1 > [7,] NA <NA> <NA> 1 > [8,] NA NaN <NA> 1 > [9,] 0+NaNi 0 NaN 2 > [10,] 1+NaNi 1 NaN 2 > [11,] NA <NA> NaN 1 > [12,] NaN+NaNi NaN NaN 2 >> > ------------------------------- > Note that 'mz <- match(z, z)' and hence the last column of the matrix above > are very different in current R, > distinguishing most kinds of NA / NaN against the documentation (and the > real/numeric case). > Martin Maechler > R Core Team > ### Basically a shortened version of the PR#16885 -- complex part b) > ### of R/tests/reg-tests-1c.R > ## b) complex 'x' with different kinds of NaN > x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0) > ## --- = NA_real_ but that does not exist e.g., in R 2.3.1 > ## similarly, '1L', '2L', .. do not exist e.g., in R 2.3.1 > (z <- z[is.na(z)]) > outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ? > r <- matrix( , length(x), length(y)) > for(i in seq(along=x)) > for(j in seq(along=y)) > r[i,j] <- identical(z[i], z[j], ...) > r > } > ## Very strictly - in the sense of identical() -- these 12 complex numbers all differ: > ## a version that works in older versions of R, where identical() had fewer arguments! > outerID.picky <- function(x,y) { > nF <- length(formals(identical)) - 2 > do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF)))) > } > oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 is a wild guess > symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R] > try(# for older R versions > stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 1)) > ) > (mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in print()/format() _FIXME_ > zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern : > print(cbind(format = format(z), t(zRI), mz), quote=FALSE) > ## compute match(z[i], z) , for i = 1,2,..,12 : > (m1z <- sapply(z, match, table = z)) > ## 1 2 1 2 2 2 1 2 2 2 1 2 # R 1.2.3 (2001-04-26) > ## 1 2 3 4 1 3 7 8 2 4 8 7 # R 1.4.1 (2002-01-30) > ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 1.5.1 (2002-06-17) > ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 1.8.1 (2003-11-21) > ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 2.0.1 (2004-11-15) > ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 2.1.1 (2005-06-20) > ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 2.3.1 (2006-06-01) > ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 2.5.1 (2007-06-27) > ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 2.10.1 (2009-12-14) > ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 3.1.1 (2014-07-10) > ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 3.2.5 -- and 3.3.0 patched > ## 1 2 1 2 1 1 1 1 2 2 1 2 # <<-- Martin's R-devel and proposed future R > if(!exists("anyNA", mode="function")) anyNA <- function(x) any(is.na(x)) > stopifnot(apply(zRI, 2, anyNA)) # *all* are NA *or* NaN (or both) > is.NA <- function(.) is.na(.) & !is.nan(.) > (iNaN <- apply(zRI, 2, function(.) any(is.nan(.)))) > (iNA <- apply(zRI, 2, function(.) any(is.NA (.)))) # has non-NaN NA's > ## In Martin's version of R-devel : > stopifnot(identical(m1z == 1, iNA), > identical(m1z == 2, !iNA)) > ## m1z uses match(x, *) with length(x) == 1 and failed in R 3.3.0 > stopifnot(identical(m1z, mz)) > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel