On 9/25/23 07:05, Martin Maechler wrote: >>>>>> Hervé Pagès >>>>>> on Sat, 23 Sep 2023 16:52:21 -0700 writes: > > Hi Martin, > > On 9/23/23 06:43, Martin Maechler wrote: > >>>>>>> Hervé Pagès > >>>>>>> on Fri, 22 Sep 2023 16:55:05 -0700 writes: > >> > The problem is that you have things that are > >> > **semantically** different but look exactly the same: > >> > >> > They look the same: > >> > >> >> x > >> > [1] NA > >> >> y > >> > [1] NA > >> >> z > >> > [1] NA > >> > >> >> is.na(x) > >> > [1] TRUE > >> >> is.na(y) > >> > [1] TRUE > >> >> is.na(z) > >> > [1] TRUE > >> > >> >> str(x) > >> > cplx NA > >> >> str(y) > >> > num NA > >> >> str(z) > >> > cplx NA > >> > >> > but they are semantically different e.g. > >> > >> >> Re(x) > >> > [1] NA > >> >> Re(y) > >> > [1] -0.5 # surprise! > >> > >> >> Im(x) # surprise! > >> > [1] 2 > >> >> Im(z) > >> > [1] NA > >> > >> > so any expression involving Re() or Im() will produce > >> > different results on input that look the same on the > >> > surface. > >> > >> > You can address this either by normalizing the internal > >> > representation of complex NA to always be complex(r=NaN, > >> > i=NA_real_), like for NA_complex_, or by allowing the > >> > infinite variations that are currently allowed and at the > >> > same time making sure that both Re() and Im() always > >> > return NA_real_ on a complex NA. > >> > >> > My point is that the behavior of complex NA should be > >> > predictable. Right now it's not. Once it's predictable > >> > (with Re() and Im() both returning NA_real_ regardless of > >> > internal representation), then it no longer matters what > >> > kind of complex NA is returned by as.complex(NA_real_), > >> > because they are no onger distinguishable. > >> > >> > H. > >> > >> > On 9/22/23 13:43, Duncan Murdoch wrote: > >> >> Since the result of is.na(x) is the same on each of > >> >> those, I don't see a problem. As long as that is > >> >> consistent, I don't see a problem. You shouldn't be using > >> >> any other test for NA-ness. You should never be > >> >> expecting identical() to treat different types as the > >> >> same (e.g. identical(NA, NA_real_) is FALSE, as it > >> >> should be). If you are using a different test, that's > >> >> user error. > >> >> > >> >> Duncan Murdoch > >> >> > >> >> On 22/09/2023 2:41 p.m., Hervé Pagès wrote: > >> >>> We could also question the value of having an infinite > >> >>> number of NA representations in the complex space. For > >> >>> example all these complex values are displayed the same > >> >>> way (as NA), are considered NAs by is.na(), but are not > >> >>> identical or semantically equivalent (from an Re() or > >> >>> Im() point of view): > >> >>> > >> >>> NA_real_ + 0i > >> >>> > >> >>> complex(r=NA_real_, i=Inf) > >> >>> > >> >>> complex(r=2, i=NA_real_) > >> >>> > >> >>> complex(r=NaN, i=NA_real_) > >> >>> > >> >>> In other words, using a single representation for > >> >>> complex NA (i.e. complex(r=NA_real_, i=NA_real_)) would > >> >>> avoid a lot of unnecessary complications and surprises. > >> >>> > >> >>> Once you do that, whether as.complex(NA_real_) should > >> >>> return complex(r=NA_real_, i=0) or complex(r=NA_real_, > >> >>> i=NA_real_) becomes a moot point. > >> >>> > >> >>> Best, > >> >>> > >> >>> H. > >> > >> Thank you, Hervé. > >> Your proposition is yet another one, > >> to declare that all complex NA's should be treated as identical > >> (almost/fully?) everywhere. > >> > >> This would be a possibility, but I think a drastic one. > >> > >> I think there are too many cases, where I want to keep the > >> information of the real part independent of the values of the > >> imaginary part (e.g. think of the Riemann hypothesis), and > >> typically vice versa. > > Use NaN for that, not NA. > > Aa..h, *that* is your point. > > Well, I was on exactly this line till a few years ago. > > However, very *sadly* to me, note how example(complex) > nowadays ends : > > ##---------------------------------------------------------------------------- > showC <- function(z) noquote(sprintf("(R = %g, I = %g)", Re(z), Im(z))) > > ## The exact result of this *depends* on the platform, compiler, > math-library: > (NpNA <- NaN + NA_complex_) ; str(NpNA) # *behaves* as 'cplx NA' .. > stopifnot(is.na(NpNA), is.na(NA_complex_), is.na(Re(NA_complex_)), > is.na(Im(NA_complex_))) > showC(NpNA)# but does not always show '(R = NaN, I = NA)' > ## and this is not TRUE everywhere: > identical(NpNA, NA_complex_) > showC(NA_complex_) # always == (R = NA, I = NA) > ##---------------------------------------------------------------------------- > > Unfortunately --- notably by the appearance of the new (M1, M1 pro, M2, ...) > processors, but not only --- > I (and others, but the real experts) have wrongly assumed that > NA {which on the C-level is *one* of the many possible internal NaN's} > would be preserved in computations, as they are on the R level > -- well, typically, and as long as we've used intel-compatible > chips and gcc-compilers. > > But modern speed optimizations (also seen in accelerated > Blas/Lapack ..) have noticed that no official C standard > requires such preservations (i.e., in our case of NA, *the* special NaN), > and -- for speed reasons -- now on these accelerated platforms, > R-level NA's "suddenly" turn into R-level NaN's (all are NaN on > the C level but "with different payload") from quite "trivial" computations. > > Consequently, the strict distinction between NA and NaN > even when they are so important for us statisticians / careful data analysts, > nowadays will tend to have to be dismissed eventually.
I see. Thanks for pointing that out. This would actually be an argument in favor of preserving the current as.complex(NA_real_) behavior. If the difference between NA and NaN is fading away then there's no reason to change as.complex(NA_real_) to make it behave radically differently from as.complex(NaN). > > ... and as I have mentioned also mentioned earlier in this thread, > I believe we should also print the complex values of z > fulfilling is.na(z) by their Re & Im, i.e., e.g. > > NA+iNA (or NaN+iNA or NA+iNaN or NaN+iNaN > NA+0i, NaN+1i, 3+iNaN, 4+iNA etc oops, I should have paid more attention to your first post in this thread, sorry for that. And yes, I totally agree with improving the printing of complexes when Re and/or Im is NA Best, H. > > but note that the exact printing itself should *not* become the topic of this > thread unless by mentioning that I strongly believe the print()ing > of complex vectors in R should change anway *and* for that reason, > the printing / "looks the same as" / ... should not be strong > reasons in my view for deciding how *coercion*, > notably as.complex(.) should work. > > Martin > > > >> With your proposal, for a (potentially large) vector of complex > numbers, > >> after > >> Re(z) <- 1/2 > >> > >> I could no longer rely on Re(z) == 1/2, > >> because it would be wrong for those z where (the imaginary part/ the > number) > >> was NA/NaN. > > > My proposal is to do this only if the Re and/or Im parts are NAs, not > if > > they are NaNs. > > > BTW the difference between how NAs and NaNs are treated in complex > > vectors is another issue that adds to the confusion: > > > > complex(r=NA, i=2) > > [1] NA > > > complex(r=NaN, i=2) > > [1] NaN+2i > > > Not displaying the real + imaginary parts in the NA case kind of > > suggests that somehow they are gone i.e. that Re(z) and Im(z) are both > NA. > > > Note that my proposal is not to change the display but to change Re() > > and Im() to make them consistent with the display. > > > In your Re(z) <- 1/2 example (which seems to be theoretical only > because > > I don't see `Re<-` in base R), any NA in 'z' would be replaced with > > complex(r=NaN, i=1/2), so you could rely on Re(z) == 1/2. > > >> Also, in a similar case, a > >> > >> Im(z) <- NA > >> > >> would have to "destroy" all real parts Re(z); > >> not really typically in memory, but effectively for the user, Re(z) > >> would be all NA/NaN. > > Yes, setting a value to NA destroys it beyond repair in the sense that > > there's no way you can retrieve any original parts of it. I'm fine with > > that. I'm not fine with an NA being used to store hidden information. > >> > >> And I think there are quite a few other situations > >> where looking at Re() and Im() separately makes a lot of sense. > > Still doable if the Re or Im parts contain NaNs. > >> > >> Spencer also made a remark in this direction. > >> > >> All in all I'd be very reluctant to move in this direction; > >> but yes, I'm just one person ... let's continue musing and > >> considering ! > > > I understand the reluctance since this would not be a light move, but > > thanks for considering. > > > Best, > > > H. > > >> > >> Martin > >> > >> >>> On 9/22/23 03:38, Martin Maechler wrote: > >> >>>>>>>>> Mikael Jagan on Thu, 21 Sep 2023 00:47:39 > >> >>>>>>>>> -0400 writes: > >> >>>> > Revisiting this thread from April: > >> >>>> > >> >>>> >https://stat.ethz.ch/pipermail/r-devel/2023-April/082545.html > >> >>>> > >> >>>> > where the decision (not yet backported) was > >> >>>> made for > as.complex(NA_real_) to give > >> >>>> NA_complex_ instead of > complex(r=NA_real_, > >> >>>> i=0), to be consistent with > help("as.complex") > >> >>>> and as.complex(NA) and as.complex(NA_integer_). > >> >>>> > >> >>>> > Was any consideration given to the alternative? > >> >>>> > That is, to changing as.complex(NA) and > >> >>>> as.complex(NA_integer_) to > give > >> >>>> complex(r=NA_real_, i=0), consistent with > > >> >>>> as.complex(NA_real_), then amending help("as.complex") > >> >>>> > accordingly? > >> >>>> > >> >>>> Hmm, as, from R-core, mostly I was involved, I admit to > >> >>>> say "no", to my knowledge the (above) alternative > >> >>>> wasn't considered. > >> >>>> > >> >>>> > The principle that > > >> >>>> Im(as.complex(<real=(double|integer|logical)>)) should > >> >>>> be zero > is quite fundamental, in my view, hence > >> >>>> the "new" behaviour > seems to really violate the > >> >>>> principle of least surprise ... > >> >>>> > >> >>>> of course "least surprise" is somewhat subjective. > >> >>>> Still, I clearly agree that the above would be one > >> >>>> desirable property. > >> >>>> > >> >>>> I think that any solution will lead to *some* surprise > >> >>>> for some cases, I think primarily because there are > >> >>>> *many* different values z for which is.na(z) is > >> >>>> true, and in any case NA_complex_ is only of the > >> >>>> many. > >> >>>> > >> >>>> I also agree with Mikael that we should reconsider the > >> >>>> issue that was raised by Davis Vaughan here ("on > >> >>>> R-devel") last April. > >> >>>> > >> >>>> > Another (but maybe weaker) argument is that > >> >>>> > double->complex coercions happen more often > >> >>>> than > logical->complex and integer->complex > >> >>>> ones. Changing the > behaviour of the more > >> >>>> frequently performed coercion is > more likely to > >> >>>> affect code "out there". > >> >>>> > >> >>>> > Yet another argument is that one expects > >> >>>> > >> >>>> > identical(as.complex(NA_real_), NA_real_ + > >> >>>> (0+0i)) > >> >>>> > >> >>>> > to be TRUE, i.e., that coercing from double to > >> >>>> complex is > equivalent to adding a complex > >> >>>> zero. The new behaviour > makes the above FALSE, > >> >>>> since NA_real_ + (0+0i) gives > > >> >>>> complex(r=NA_real_, i=0). > >> >>>> > >> >>>> No! --- To my own surprise (!) --- in current R-devel > >> >>>> the above is TRUE, and NA_real_ + (0+0i) , the > >> >>>> same as NA_real_ + 0i , really gives > >> >>>> complex(r=NA, i=NA) : > >> >>>> > >> >>>> Using showC() from ?complex > >> >>>> > >> >>>> showC <- function(z) noquote(sprintf("(R = %g, I = > >> >>>> %g)", Re(z), Im(z))) > >> >>>> > >> >>>> we see (in R-devel) quite consistently > >> >>>> > >> >>>>> showC(NA_real_ + 0i) > >> >>>> [1] (R = NA, I = NA) > >> >>>>> showC(NA + 0i) # NA is 'logical' > >> >>>> [1] (R = NA, I = NA) where as in R 4.3.1 and > >> >>>> "R-patched" -- *in*consistently > >> >>>> > >> >>>>> showC(NA_real_ + 0i) > >> >>>> [1] (R = NA, I = 0) > >> >>>>> showC(NA + 0i) > >> >>>> [1] (R = NA, I = NA) .... and honestly, I do not see > >> >>>> *where* (and when) we changed the underlying code (in > >> >>>> arithmetic.c !?) in R-devel to *also* produce > >> >>>> NA_complex_ in such complex *arithmetic* > >> >>>> > >> >>>> > >> >>>> > Having said that, one might also (but more > >> >>>> naively) expect > >> >>>> > >> >>>> > > >> >>>> identical(as.complex(as.double(NA_complex_)), > >> >>>> NA_complex_) > >> >>>> > >> >>>> > to be TRUE. > >> >>>> > >> >>>> as in current R-devel > >> >>>> > >> >>>> > Under my proposal it continues to be FALSE. > >> >>>> > >> >>>> as in "R-release" > >> >>>> > >> >>>> > Well, I'd prefer if it gave FALSE with a > >> >>>> warning > "imaginary parts discarded in > >> >>>> coercion", but it seems that > > >> >>>> as.double(complex(r=a, i=b)) never warns when either of > >> >>>> > 'a' and 'b' is NA_real_ or NaN, even where > >> >>>> "information" > {nonzero 'b'} is clearly lost ... > >> >>>> > >> >>>> The question of *warning* here is related indeed, but I > >> >>>> think we should try to look at it only *secondary* to > >> >>>> your first proposal. > >> >>>> > >> >>>> > Whatever decision is made about > >> >>>> as.complex(NA_real_), > maybe these points should > >> >>>> be weighed before it becomes part of > R-release > >> >>>> ... > >> >>>> > >> >>>> > Mikael > >> >>>> > >> >>>> Indeed. > >> >>>> > >> >>>> Can we please get other opinions / ideas here? > >> >>>> > >> >>>> Thank you in advance for your thoughts! Martin > >> >>>> > >> >>>> --- > >> >>>> > >> >>>> PS: > >> >>>> > >> >>>> Our *print()*ing of complex NA's ("NA" here meaning > >> >>>> NA or NaN) is also unsatisfactory, e.g. in the case > >> >>>> where all entries of a vector are NA in the sense of > >> >>>> is.na(.), but their Re() and Im() are not all NA: > >> >>>> showC <- function(z) noquote(sprintf("(R = %g, I = > >> >>>> %g)", Re(z), Im(z))) z <- complex(, c(11, NA, NA), > >> >>>> c(NA, 99, NA)) z showC(z) > >> >>>> > >> >>>> gives > >> >>>> > >> >>>> > z [1] NA NA NA > showC(z) [1] (R = > >> >>>> 11, I = NA) (R = NA, I = 99) (R = NA, I = NA) > >> >>>> > >> >>>> but that (printing of complex) *is* another issue, in > >> >>>> which we have the re-opened bugzilla PR#16752 > >> >>>> ==>https://bugs.r-project.org/show_bug.cgi?id=16752 > >> >>>> > >> >>>> on which we also worked during the R Sprint in Warwick > >> >>>> three weeks ago, and where I want to commit changes in > >> >>>> any case {but think we should change even a bit more > >> >>>> than we got to during the Sprint}. > >> >>>> > >> >>>> ______________________________________________ > >> >>>>R-devel@r-project.org mailing list > >> >>>>https://stat.ethz.ch/mailman/listinfo/r-devel > >> >>> > >> >> > >> > -- > >> > Hervé Pagès > >> > >> > Bioconductor coreteamhpages.on.git...@gmail.com > >> > >> > >> > > -- > > Hervé Pagès > > > Bioconductor Core Team > >hpages.on.git...@gmail.com > > > [[alternative HTML version deleted]] > -- Hervé Pagès Bioconductor Core Team hpages.on.git...@gmail.com [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel