>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org> >>>>> on Thu, 11 Aug 2016 16:19:49 +0000 writes:
> I stand corrected. The part "If set to 'NULL', it implies > 'useNA="always"'." is even in the documentation in R > 2.8.0. It was my fault not to check carefully. I wonder, > why "always" was chosen for 'useNA' for exclude = NULL. me too. "ifany" would seem more logical, and I am considering changing to that as a 2nd step (if the 1st step, below) shows to be feasible. > Why exclude = NULL is so special? What about another > 'exclude' of length zero, like character(0) (not c(), > because c() is NULL)? I thought that, too. But then, I > have no opinion about making it general. As mentioned, I entirely agree with that {and you are right about c() !!}. > It fits my expectation to override 'useNA' only if the > user doesn't explicitly specify 'useNA'. > Thank you for looking into this. you are welcome. As first step, I plan to commit the change to (*) useNA <- if (missing(useNA) && !missing(exclude) && !(NA %in% exclude)) "always" as proposed yesterday, and I'll eventually see / be notified about the effect in CRAN space. -- (*) slightly more efficiently, I'll be using match() directly instead of %in% > My points: > Could R 2.7.2 behavior of table(<non-factor>, exclude = NULL) be brought back? But R 3.3.1 behavior is in R since version 2.8.0, rather long. you are right... but then, the places / cases where the behavior would change back should be quite rare. > If not, I suggest changing summary(<logical>). > -------------------------------------------- Thank you for your feedback, Suharto! Martin > On Thu, 11/8/16, Martin Maechler <maech...@stat.math.ethz.ch> wrote: > > Subject: Re: [Rd] table(exclude = NULL) always includes NA > > @r-project.org > Cc: "Martin Maechler" <maech...@stat.math.ethz.ch> > Date: Thursday, 11 August, 2016, 12:39 AM > > >>>>> Martin Maechler <maech...@stat.math.ethz.ch> > >>>>> on Tue, 9 Aug 2016 15:35:41 +0200 writes: > > >>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org> > >>>>> on Sun, 7 Aug 2016 15:32:19 +0000 writes: > > > > This is an example from https://stat.ethz.ch/pipermail/r-help/2007-May/132573.html . > > > > > With R 2.7.2: > > > > > > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1) > > > > table(a, b, exclude = NULL) > > > b > > > a 1 2 > > > 1 1 1 > > > 2 2 0 > > > 3 1 0 > > > <NA> 1 0 > > > > > With R 3.3.1: > > > > > > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1) > > > > table(a, b, exclude = NULL) > > > b > > > a 1 2 <NA> > > > 1 1 1 0 > > > 2 2 0 0 > > > 3 1 0 0 > > > <NA> 1 0 0 > > > > table(a, b, useNA = "ifany") > > > b > > > a 1 2 > > > 1 1 1 > > > 2 2 0 > > > 3 1 0 > > > <NA> 1 0 > > > > table(a, b, exclude = NULL, useNA = "ifany") > > > b > > > a 1 2 <NA> > > > 1 1 1 0 > > > 2 2 0 0 > > > 3 1 0 0 > > > <NA> 1 0 0 > > > > > For the example, in R 3.3.1, the result of 'table' with > > > exclude = NULL includes NA even if NA is not present. It is > > > different from R 2.7.2, that comes from factor(exclude = NULL), > > > that includes NA only if NA is present. > > > > I agree that this (R 3.3.1 behavior) seems undesirable and looks > > wrong, and the old (<= 2.2.7) behavior for table(a,b, > > exclude=NULL) seems desirable to me. > > > > > > > >From R 3.3.1 help on 'table', in "Details" section: > > > 'useNA' controls if the table includes counts of 'NA' values: the allowed values correspond to never, only if the count is positive and even for zero counts. This is overridden by specifying 'exclude = NULL'. > > > > > Specifying 'exclude = NULL' overrides 'useNA' to what value? The documentation doesn't say. Looking at the code of function 'table', the value is "always". > > > > Yes, it should be documented what happens for this case, > > (but read on ...) > > and it is *not* true that the documentation does not say, since > 2013, it has contained > > exclude: levels to remove for all factors in ‘...’. If set to ‘NULL’, > it implies ‘useNA = "always"’. See ‘Details’ for its > interpretation for non-factor arguments. > > > > > For the example, in R 3.3.1, the result like in R 2.7.2 can be obtained with useNA = "ifany" and 'exclude' unspecified. > > > > Yes. What should we do? > > I currently think that we'd want to change the line > > > > useNA <- if (!missing(exclude) && is.null(exclude)) "always" > > > > to > > > > useNA <- if (!missing(exclude) && is.null(exclude)) "ifany" # was "always" > > > > > > which would not even contradict documentation, as indeed you > > mentioned above, the exact action here had not been documented. > > The last part ("which ..") above is wrong, as mentioned earlier. > > The above change entails behaviour which looks better to me; > however, the change *is* "against the current documentation". > and after experimentation (a "complete factorial design" of > argument settings), I'm not entirely happy with the result.... and one reason > is that 'exclude = NULL' and (e.g.) 'exclude = c()' > are (still) handled differently: From a usual interpreation, > both should mean > "do not exclude any factor entries (and levels) from tabulation" > but one of the two changes the default of 'useNA' and the other > does not. If we want a change anyway (and have to update the doc), > it could be "more logical" to replace the line above by > > useNA <- if (missing(useNA) && !missing(exclude) && !(NA %in% exclude)) "always" > > notably, replacing 'useNA' *only* if it has not been specified, > which seems much closer to "typically expected" behavior.. > > > > > > The change above at least does not break any of the standard R > > tests ('make check-all', i.e., including the recommended > > packages), which for me confirms that it may be "what is > > best"... > > > > ---- > > > > Thank you for mentioning the important consequence for summary(<logical>). > > They can helping insight what a "probably best" behavior should > > be for these cases of table(). > > > > Martin Maechler, > > ETH Zurich > > > > > The result of 'summary' of a logical vector is affected. As mentioned in http://stackoverflow.com/questions/26775501/r-dropping-nas-in-logical-column-levels , in the code of function 'summary.default', for logical, table(object, exclude = NULL) is used. > > > > > With R 2.7.2: > > > > > > log <- c(NA, logical(4), NA, !logical(2), NA) > > > > summary(log) > > > Mode FALSE TRUE NA's > > > logical 4 2 3 > > > > summary(log[!is.na(log)]) > > > Mode FALSE TRUE > > > logical 4 2 > > > > summary(TRUE) > > > Mode TRUE > > > logical 1 > > > > > With R 3.3.1: > > > > > > log <- c(NA, logical(4), NA, !logical(2), NA) > > > > summary(log) > > > Mode FALSE TRUE NA's > > > logical 4 2 3 > > > > summary(log[!is.na(log)]) > > > Mode FALSE TRUE NA's > > > logical 4 2 0 > > > > summary(TRUE) > > > Mode TRUE NA's > > > logical 1 0 > > > > > In R 3.3.1, "NA's' is always in the result of 'summary' of a logical vector. It is unlike 'summary' of a numeric vector. > > > On the other hand, in R 3.3.1, FALSE is not in the result of 'summary' of a logical vector that doesn't contain FALSE. > > > > > I prefer the result of 'summary' of a logical vector like in R 2.7.2, or, alternatively, the result that always includes all possible values: FALSE, TRUE, NA. > > > > I tend to agree, and strongly prefer the 'R(<=2.7.2)'-behavior > > for table() {and hence summary(<logical>)}. > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel