On Fri, Jun 24, 2011 at 07:30, Laurent Gautier <lgaut...@gmail.com> wrote: > On 2011-06-24 13:59, Nathaniel Smith <n...@pobox.com> wrote: >> On Thu, Jun 23, 2011 at 5:56 PM, Benjamin Root<ben.r...@ou.edu> wrote: >>> Lastly, I am not entirely familiar with R, so I am also very curious about >>> what this magical "NA" value is, and how it compares to how NaNs work. >>> Although, Pierre brought up the very good point that NaNs woulldn't work >>> anyway with integer arrays (and object arrays, etc.). >> Since R is designed for statistics, they made the interesting decision >> that *all* of their core types have a special designated "missing" >> value. At the R level this is just called "NA". Internally, there are >> a bunch of different NA values -- for floats it's a particular NaN, >> for integers it's INT_MIN, for booleans it's 2 (IIRC), etc. (You never >> notice this, because R will silently cast a NA of one type into NA of >> another type whenever needed, and they all print the same.) >> >> Because any array can contain NA's, all R functions then have to have >> some way of handling this -- all their integer arithmetic knows that >> INT_MIN is special, for instance. The rules are basically the same as >> for NaN's, but NA and NaN are different from each other (because one >> means "I don't know, could be anything" and the other means "you tried >> to divide by 0, I *know* that's meaningless"). >> >> That's basically it. >> >> -- Nathaniel > > Would the use of R's system for expressing "missing values" be possible > in numpy through a special flag ? > > Any given numpy array could have a boolean flag (say "na_aware") > indicating that some of the values are representing a missing cell. > > If the exact same system is used, interaction with R (through something > like rpy2) would be simplified and more robust.
The alternative proposal would be to add a few new dtypes that are NA-aware. E.g. an nafloat64 would reserve a particular NaN value (there are lots of different NaN bit patterns, we'd just reserve one) that would represent NA. An naint32 would probably reserve the most negative int32 value (like R does). Using the NA-aware dtypes signals that you are using NA values; there is no need for an additional flag. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion