>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel@r-project.org> >>>>> on Tue, 31 Jan 2017 15:43:53 +0000 writes:
> Function 'aggregate.data.frame' in R has taken a different route. With drop=FALSE, the function is also applied to subset corresponding to combination of grouping variables that doesn't appear in the data (example 2 in https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html). Interesting point (I couldn't easily find 'the example 2' though). However, aggregate.data.frame() is a considerably more sophisticated function and one goal was to change tapply() as little as possible for compatibility (and maintenance!) reasons . > Because 'default' is used only when simplification happens, putting 'default' > after 'simplify' in the argument list may be more logical. Yes, from this point of view, you are right; I had thought about that too; on the other hand, it belongs "closely" to the 'FUN' and I think that's why I had decided not to change the proposal.. > Anyway, it doesn't affect call to 'tapply' because the argument 'default' > must be specified by name. Exactly.. so we keep the order as is. > With the code using > if(missing(default)) , > I consider the stated default value of 'default', > default = NA , > misleading because the code doesn't use it. I know and I also had thought about it and decided to keep it in the spirit of "self documentation" because "in spirit", the default still *is* NA. > Also, > tapply(1:3, 1:3, as.raw) > is not the same as > tapply(1:3, 1:3, as.raw, default = NA) . > The accurate statement is the code in > if(missing(default)) , > but it involves the local variable 'ans'. exactly. But putting that whole expression in there would look confusing to those using str(tapply), args(tapply) or similar inspection to quickly get a glimpse of the function user "interface". That's why we typically don't do that and rather slightly cheat with the formal default, for the above "didactical" purposes. If you are puristic about this, then missing() should almost never be used when the function argument has a formal default. I don't have a too strong opinion here, and we do have quite a few other cases, where the formal default argument is not always used because of if(missing(.)) clauses. I think I could be convinced to drop the '= NA' from the formal argument list.. > As far as I know, the result of function 'array' in is not a classed object and the default method of `[<-` will be used in the 'tapply' code portion. > As far as I know, the result of 'lapply' is a list without class. So, 'unlist' applied to it uses the default method and the 'unlist' result is a vector or a factor. You may be right here ((or not: If a package author makes array() into an S3 generic and defines S3method(array, *) and she or another make tapply() into a generic with methods, are we really sure that this code would not be used ??)) still, the as.raw example did not easily work without a warning when using as.vector() .. or similar. > With the change, the result of > tapply(1:3, 1:3, factor, levels=3:1) > is of mode "character". The value is from the internal code, not from the factor levels. It is worse than before the change, where it is really the internal code, integer. I agree that this change is not desirable. One could argue that it was quite a "lucky coincidence" that the previous code returned the internal integer codes though.. > In the documentation, the description of argument 'simplify' says: "If 'TRUE' (the default), then if 'FUN' always returns a scalar, 'tapply' returns an array with the mode of the scalar." > To initialize array, a zero-length vector can also be used. yes, of course; but my ans[0L][1L] had the purpose to get the correct mode specific version of NA .. which works for raw (by getting '00' because "raw" has *no* NA!). So it seems I need an additional !is.factor(ans) there ... a bit ugly. --------- > For 'xtabs', I think that it is better if the result has storage mode > "integer" if 'sum' results are of storage mode "integer", as in R 3.3.2. you are right, that *is* preferable > As 'default' argument for 'tapply', 'xtabs' can use 0L, or use 0L or 0 > depending on storage mode of the summed quantity. indeed, that will be an improvement there! ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel