Thanks Martin (sorry about the HTML - GMail and my incompetent use of it; hopefully I've beaten it into submission this time).
I can see the point of view, however the inconsistency remains whether one patches the other summary stat functions to work as if given a matrix or squash all the Summary.data.frame methods as well. More comments in-line On 22 August 2014 02:23, Martin Maechler <maech...@stat.math.ethz.ch> wrote: > >>>>> Gavin Simpson <ucfa...@gmail.com> > >>>>> on Thu, 21 Aug 2014 12:32:31 -0600 writes: > <snip/> > >> mean(df) > > [1] NA Warning message: In mean.default(df) : argument is > > not numeric or logical: returning NA > > I would tend to agree (:-) that mean() should rather give an error here > (and read on). > > > I recall the times where `mean(df)` would give > > `colMeans(df)` and this behaviour was deemed > > inconsistent. > > > It seems though that the change has removed one > > inconsistency and replaced it with another. > > The whole idea of removing the mean method for data frames was > that there are many more summary functions, e.g. median, and it > seems wrong to write a data frame method for each of them; then > why for *some* of them. > So we *did* keep the Summary.data.frame group method, > and that's why min(), max(), sum(),.. work {though sum() will be > slightly slower than colSums()}. > and gives a different answer, unless you meant sum(colSums(df)) == sum(df)? > When teaching R, the audience should learn to use apply() or > similar functions, e.g. from the hadleyverse, > because that is the general approach of dealing with matrix-like > objects that is indeed how I think users should start thinking > of data frames. This actually came up because someone was wanting the mean over all columns (of a dataset where columns represented repeated measures per patient, rows), hence `apply()` is not really suitable here and we've switched the example to do `mean(as.matrix(df))` to get what they wanted. I wasn't suggesting having `mean()` do anything like `colMeans()` or the `mean.data.frame` of old. I was wondering why we couldn't gain some semblance of consistency by making *all* (although I didn't mention them) these related functions work on a data frame (with all numeric columns) as if it were a matrix, just like `min()`, `max()`, `range()` etc do now. > Am I missing good reasons why there couldn't be a > > `mean.data.frame()` method which worked like `max()` etc > > when given a data frame? > yes, see above. > [ There's no consistent end after that: Why is median() different, why > would > sd(), var(), ... not work ?] I don't see why they shouldn't if `max()` etc work *for an entirely numeric data frame*. > > Namely that they return the > required statistic *only* when presented with a data frame > > of all numeric variables? E.g. > <snip /> > > I just can't see the sense in having `mean` work the way > > it does now? > > I agree. It would be better to give an error. > E.g., mean.default could start with > > if(is.object(x)) > stop("there is no mean() method for ", class(x)[1], " objects") That would give a nicer error message but wouldn't solve the deeper issue of a lack of consistency, which *is* an issue for people when trying to learn R. So, can't we either kill off the summary group method for data frames or identify a set of functions which should work similarly to the existing summary group method members? Assuming that a patch would be forthcoming with documentation rather than relying on RCore to do this manually? > > Thanks, > > Gavin > > > -- > > > Gavin Simpson, PhD > > > [[alternative HTML version deleted]] > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > ( hmmm... and that on R-devel ... ) > Yeah, sorry. Hopefully fixed now! G -- Gavin Simpson, PhD [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel