Hmm, yes, this is probably wrong. E.g., we are likely to get inconsistencies out of boundary cases like this
> a <- na.omit(airquality) > sum(a) [1] 37495.3 > sum(a[FALSE,]) Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables Or, closer to an actual use case: > sum(subset(a, Ozone>100)) [1] 3330.5 > sum(subset(a, Ozone>200)) Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric variables However, given that numeric summaries generally treat logicals as 0/1, wouldn't it be easiest just to extend the check inside Summary.data.frame with "&& !is.logical(x)"? > sum(as.matrix(a[FALSE,])) [1] 0 -pd > On 17 Oct 2020, at 21:18 , Martin <r...@mb706.com> wrote: > > The "Summary" group generics always throw errors for a data.frame with zero > rows, for example: >> sum(data.frame(x = numeric(0))) > #> Error in FUN(X[[i]], ...) : > #> only defined on a data frame with all numeric variables > Same behaviour for min, max, any, all, ... . I believe this is inconsistent > with what these methods do for other empty objects (vectors, matrices), where > the return value is chosen to ensure transitivity: sum(numeric(0)) == 0. > > The reason for this is that the return type of as.matrix() for empty (no rows > or no columns) data.frame objects is always a matrix of type "logical". The > Summary method for data.frame, in turn, throws an error when the data.frame, > converted to a matrix, is not of numeric type. > > I suggest two ways that make sum, min, max, ... more consistent. IMHO it > would be fitting to implement both of these fixes, because they also make > other things more consistent. > > 1. Make the return type of as.matrix() for zero-row data.frames consistent > with the type that would have been returned, had the data.frame had more than > zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if > there is an empty "character" column the return matrix should be a character > etc. This would make subsetting by row and conversion to matrix commute > (except for row names sometimes): >> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = >> FALSE]) > Furthermore, this change would make as.matrix.data.frame obey the > documentation, which indicates that the coercion hierarchy is used for the > return type. > > 2. Make the Summary.data.frame method accept data.frames that produce > non-numeric matrices. Next to the main focus of this message, I believe it > would e.g. be fitting to have any() and all() work on logical data.frame > objects. The current behaviour is such that >> any(data.frame(x = 1)) > #> [1] TRUE > #> Warning message: > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical > and >> any(data.frame(x = TRUE)) > #> Error in FUN(X[[i]], ...) : > #> only defined on a data frame with all numeric variables > So a numeric data.frame warns about implicit coercion, while a logical > data.frame (which would not need coercion) does not work at all. > > (I feel more strongly about fixing 1. than 2., because I don't know the > discussion that lead to the behaviour described in 2.) > > Best, > Martin > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel