Hi All, A twitter user, Mike fc (@coolbutuseless) mentioned today that he was surprised that repeated NAs weren't treated as a run by the rle function.
Now I know why they are not. NAs represent values which could be the same or different from eachother if they were known, so from a purely conceptual standpoint there is no way to tell whether they are the same and thus constitute a run or not. This conceptual strictness isnt universally observed, though, because we get the following: > unique(c(1, 2, 3, NA, NA, NA)) [1] 1 2 3 NA Which means that rle(sort(x))$value is not guaranteed to be the same as unique(x), which is a little strange (though likely of little practical impact). Personally, to me it also seems that, from a purely data-compression standpoint, it would be valid to collapse those missing values into a run of missing, as it reduces size in-memory/on disk without losing any information. Now none of this is to say that I suggest the default behavior be changed (that would surely disrupt some non-trivial amount of existing code) but what do people think of a group.nas argument which defaults to FALSE controlling the behavior? As a final point, there is some precedent here (though obviously not at all binding), as Bioconductor's Rle functionality does group NAs. Best, ~G [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel