On 02/10/14, 19:07 , Duncan Murdoch wrote: > On 10/02/2014 10:21 AM, Tim Hesterberg wrote: >> This isn't quite what you were asking, but might inform your choice. >> >> R doesn't try to maintain the distinction between NA and NaN when >> doing calculations, e.g.: >> > NA + NaN >> [1] NA >> > NaN + NA >> [1] NaN >> So for the aggregate package, I didn't attempt to treat them differently. > > This looks like a bug to me. In 32 bit 3.0.2 and R-patched I see > >> NA + NaN > [1] NA >> NaN + NA > [1] NA
But under 3.0.2 patched 64 bit on Maverick: > version _ platform x86_64-apple-darwin10.8.0 arch x86_64 os darwin10.8.0 system x86_64, darwin10.8.0 status Patched major 3 minor 0.2 year 2014 month 01 day 07 svn rev 64692 language R version.string R version 3.0.2 Patched (2014-01-07 r64692) nickname Frisbee Sailing > NA+NaN [1] NA > NaN+NA [1] NaN > > This seems more reasonable to me. NA should propagate. (I can see an > argument for NaN for the answer here, as I can't think of any possible > non-missing value that would give anything else when added to NaN, but > the answer should not depend on the order of operands.) > > However, I get the same as you in 64 bit 3.0.2. All calculations I've > shown are on 64 bit Windows 7. > > Duncan Murdoch > > >> >> The aggregate package is available at >> http://www.timhesterberg.net/r-packages >> >> Here is the inst/doc/missingValues.txt file from that package: >> >> -------------------------------------------------- >> Copyright 2012 Google Inc. All Rights Reserved. >> Author: Tim Hesterberg <roc...@google.com> >> Distributed under GPL 2 or later. >> >> >> Handling of missing values and not-a-numbers. >> >> >> Here I'll note how this package handles missing values. >> I do it the way R handles them, rather than the more strict way that >> S+ does. >> >> First, for terminology, >> NaN = "not-a-number", e.g. the result of 0/0 >> NA = "missing value" or "true missing value", e.g. survey >> non-response >> xx = I'll uses this for the union of those, or "missing value of >> any kind". >> >> For background, at the hardware level there is an IEEE standard that >> specifies that certain bit patterns are NaN, and specifies that >> operations involving an NaN result in another NaN. >> >> That standard doesn't say anything about missing values, which are >> important in statistics. >> >> So what R and S+ do is to pick one of the bit patterns and declare >> that to be a NA. In other words, the NA bit pattern is a subset of >> the NaN bit patterns. >> >> At the user level, the reverse seems to hold. >> You can assign either NA or NaN to an object. >> But: >> is.na(x) returns TRUE for both >> is.nan(x) returns TRUE for NaN and FALSE for NA >> Based on that, you'd think that NaN is a subset of NA. >> To tell whether something is a true missing value do: >> (is.na(x) & !is.nan(x)) >> >> The S+ convention is that any operation involving NA results in an NA; >> otherwise any operation involving NaN results in NaN. >> >> The R convention is that any operation involving xx results in an xx; >> a missing value of any kind results in another missing value of any >> kind. R considers NA and NaN equivalent for testing purposes: >> all.equal(NA_real_, NaN) >> gives TRUE. >> >> Some R functions follow the S+ convention, e.g. the Math2 functions >> in src/main/arithmetic.c use this macro: >> #define if_NA_Math2_set(y,a,b) \ >> if (ISNA (a) || ISNA (b)) y = NA_REAL; \ >> else if (ISNAN(a) || ISNAN(b)) y = R_NaN; >> >> Other R functions, like the basic arithmetic operations +-/*^, >> do not (search for PLUSOP in src/main/arithmetic.c). >> They just let the hardware do the calculations. >> As a result, you can get odd results like >> > is.nan(NA_real_ + NaN) >> [1] FALSE >> > is.nan(NaN + NA_real_) >> [1] TRUE >> >> The R help files help(is.na) and help(is.nan) suggest that >> computations involving NA and NaN are indeterminate. >> >> It is faster to use the R convention; most operations are just >> handled by the hardware, without extra work. >> >> In cases like sum(x, na.rm=TRUE), the help file specifies that both NA >> and NaN are removed. >> >> >> >> >> >There is one NA but mulitple NaNs. >> > >> >And please re-read 'man memcmp': your cast is wrong. >> > >> >On 10/02/2014 06:52, Kevin Ushey wrote: >> >> Hi R-devel, >> >> >> >> I have a question about the differentiation between NA and NaN values >> >> as implemented in R. In arithmetic.c, we have >> >> >> >> int R_IsNA(double x) >> >> { >> >> if (isnan(x)) { >> >> ieee_double y; >> >> y.value = x; >> >> return (y.word[lw] == 1954); >> >> } >> >> return 0; >> >> } >> >> >> >> ieee_double is just used for type punning so we can check the final >> >> bits and see if they're equal to 1954; if they are, x is NA, if >> >> they're not, x is NaN (as defined for R_IsNaN). >> >> >> >> My question is -- I can see a substantial increase in speed (on my >> >> computer, in certain cases) if I replace this check with >> >> >> >> int R_IsNA(double x) >> >> { >> >> return memcmp( >> >> (char*)(&x), >> >> (char*)(&NA_REAL), >> >> sizeof(double) >> >> ) == 0; >> >> } >> >> >> >> IIUC, there is only one bit pattern used to encode R NA values, so >> >> this should be safe. But I would like to be sure: >> >> >> >> Is there any guarantee that the different functions in R would return >> >> NA as identical to the bit pattern defined for NA_REAL, for a given >> >> architecture? Similarly for NaN value(s) and R_NaN? >> >> >> >> My guess is that it is possible some functions used internally by R >> >> might encode NaN values differently; ie, setting the lower word to a >> >> value different than 1954 (hence being NaN, but potentially not >> >> identical to R_NaN), or perhaps this is architecture-dependent. >> >> However, NA should be one specific bit pattern (?). And, I wonder if >> >> there is any guarantee that the different functions used in R would >> >> return an NaN value as identical to R_NaN (which appears to be the >> >> 'IEEE NaN')? >> >> >> >> (interested parties can see + run a simple benchmark from the gist at >> >> https://gist.github.com/kevinushey/8911432) >> >> >> >> Thanks, >> >> Kevin >> >> >> >> ______________________________________________ >> >> R-devel@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> > >> > >> >-- >> >Brian D. Ripley, rip...@stats.ox.ac.uk >> >Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >> >University of Oxford, Tel: +44 1865 272861 (self) >> >1 South Parks Road, +44 1865 272866 (PA) >> >Oxford OX1 3TG, UK Fax: +44 1865 272595 >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug
signature.asc
Description: OpenPGP digital signature
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel