Hmm, yes, this is probably wrong. E.g., we are likely to get inconsistencies 
out of boundary cases like this

> a <- na.omit(airquality)
> sum(a)
[1] 37495.3
> sum(a[FALSE,])
Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables

Or, closer to an actual use case:

> sum(subset(a, Ozone>100))
[1] 3330.5
> sum(subset(a, Ozone>200))
Error in FUN(X[[i]], ...) : 
  only defined on a data frame with all numeric variables


However, given that numeric summaries generally treat logicals as 0/1, wouldn't 
it be easiest just to extend the check inside Summary.data.frame with "&& 
!is.logical(x)"?

> sum(as.matrix(a[FALSE,]))
[1] 0

-pd

> On 17 Oct 2020, at 21:18 , Martin <r...@mb706.com> wrote:
> 
> The "Summary" group generics always throw errors for a data.frame with zero 
> rows, for example:
>> sum(data.frame(x = numeric(0)))
> #> Error in FUN(X[[i]], ...) : 
> #>   only defined on a data frame with all numeric variables
> Same behaviour for min, max, any, all, ... . I believe this is inconsistent 
> with what these methods do for other empty objects (vectors, matrices), where 
> the return value is chosen to ensure transitivity: sum(numeric(0)) == 0.
> 
> The reason for this is that the return type of as.matrix() for empty (no rows 
> or no columns) data.frame objects is always a matrix of type "logical". The 
> Summary method for data.frame, in turn, throws an error when the data.frame, 
> converted to a matrix, is not of numeric type.
> 
> I suggest two ways that make sum, min, max, ... more consistent. IMHO it 
> would be fitting to implement both of these fixes, because they also make 
> other things more consistent.
> 
> 1. Make the return type of as.matrix() for zero-row data.frames consistent 
> with the type that would have been returned, had the data.frame had more than 
> zero rows. "as.matrix(data.frame(x = numeric(0)))" should then be numeric, if 
> there is an empty "character" column the return matrix should be a character 
> etc. This would make subsetting by row and conversion to matrix commute 
> (except for row names sometimes):
>> all.equal(as.matrix(df[rows, , drop = FALSE]), as.matrix(df)[rows, , drop = 
>> FALSE])
> Furthermore, this change would make as.matrix.data.frame obey the 
> documentation, which indicates that the coercion hierarchy is used for the 
> return type.
> 
> 2. Make the Summary.data.frame method accept data.frames that produce 
> non-numeric matrices. Next to the main focus of this message, I believe it 
> would e.g. be fitting to have any() and all() work on logical data.frame 
> objects. The current behaviour is such that
>> any(data.frame(x = 1))
> #> [1] TRUE
> #> Warning message:
> #> In any(1, na.rm = FALSE) : coercing argument of type 'double' to logical
> and
>> any(data.frame(x = TRUE))
> #> Error in FUN(X[[i]], ...) : 
> #>   only defined on a data frame with all numeric variables
> So a numeric data.frame warns about implicit coercion, while a logical 
> data.frame (which would not need coercion) does not work at all.
> 
> (I feel more strongly about fixing 1. than 2., because I don't know the 
> discussion that lead to the behaviour described in 2.)
> 
> Best,
> Martin
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to