Hi, There are 2 bugs here. The proposed fix to Summary.data.frame() is fine but it doesn't address the other problem reported by the OP that as.matrix() on a zero-row data.frame doesn't respect the type of its columns, like other column-combining operations do:
df <- data.frame(a=numeric(0), b=numeric(0)) typeof(as.matrix(df)) # [1] "logical" typeof(unlist(df)) # [1] "double" typeof(do.call(c, df)) # [1] "double" I've run myself into this in a couple of occasions (not in the context of Summary methods) and worked around it with something like: as_matrix_data_frame <- function(df) { ans <- as.matrix(df) if (nrow(df) == 0L) storage.mode(ans) <- typeof(unlist(df)) ans } No reason as.matrix.data.frame() couldn't do something similar. Cheers, H. On 10/20/20 09:36, Martin Maechler wrote: >>>>>> mb706 >>>>>> on Sun, 18 Oct 2020 22:14:55 +0200 writes: > > >> From my side: it would be great if you (or R core) could prepare a > patch, it would probably take me quite a bit longer than you since I don't > have experience creating patches for R. > > > Best, Martin > > Basically, just > > 1. svn co > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.r-2Dproject.org_R_trunk&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=PpmVRjh2Jrg07bLHjlbhdBgWQWAFe6RK_J2SivC74vw&e= > R-devel > > 2. inside the R-devel source tree, find src/library/base/R/dataframe.R > make the *minimal* changes there, > > (then also add some regression tests and update the help :-) > > 3. inside R-devel, do > > svn diff -x -ubw > mb706.patch > > 4. you've got the patch file mb706.patch which you could > attach to a bug report on R's bugzilla > > (once you've got an account there ... > As you've asked for that *and* as you've proven your good > judgment about "true bug" vs. "not what I expected", > I'll create such an account for you now, in spite of the > fact that I'd still like to know a bit more than "Martin > mb706" about you ...) > > The changes have been committed to R-devel a quarter of an hour ago. > We will keep them in R-devel (currently planned to become R > 4.1.0 in spring 2021), and not port to the R-4.0.z branch, as > the change is something like an API change, and also because > nobody had ever reported this as an issue to our knowledge. > > Thank you, Martin B706 for bringing the issue up, and Gabe and Peter > for chiming in !! > > Best regards, > Martin Maechler > ETH Zurich and R core team > > > > On Sun, Oct 18, 2020, at 21:49, Gabriel Becker wrote: > >> Peter et al, > >> > >> I had the same thought, in particular for any() and all(), which in as > >> much as they should work on data.frames in the first place (which to > be > >> perfectly honest i do find quite debatable myself), should certainly > >> work on "logical" data.frames if they are going to work on "numeric" > >> ones. > >> > >> I can volunteer to prepare a patch if Martin (the reporter) did not > >> want to take a crack at it, and further if it is not already being > done > >> within R-core. > >> > >> Best, > >> ~G > >> > >> On Sun, Oct 18, 2020 at 12:19 AM peter dalgaard <pda...@gmail.com> > wrote: > >> > Hmm, yes, this is probably wrong. E.g., we are likely to get > inconsistencies out of boundary cases like this > >> > > >> > > a <- na.omit(airquality) > >> > > sum(a) > >> > [1] 37495.3 > >> > > sum(a[FALSE,]) > >> > Error in FUN(X[[i]], ...) : > >> > only defined on a data frame with all numeric variables > >> > > >> > Or, closer to an actual use case: > >> > > >> > > sum(subset(a, Ozone>100)) > >> > [1] 3330.5 > >> > > sum(subset(a, Ozone>200)) > >> > Error in FUN(X[[i]], ...) : > >> > only defined on a data frame with all numeric variables > >> > > >> > > >> > However, given that numeric summaries generally treat logicals as > 0/1, wouldn't it be easiest just to extend the check inside > Summary.data.frame with "&& !is.logical(x)"? > >> > > >> > > sum(as.matrix(a[FALSE,])) > >> > [1] 0 > >> > > >> > -pd > >> > > >> > > On 17 Oct 2020, at 21:18 , Martin <r...@mb706.com> wrote: > >> > > > >> > > The "Summary" group generics always throw errors for a data.frame > with zero rows, for example: > >> > >> sum(data.frame(x = numeric(0))) > >> > > #> Error in FUN(X[[i]], ...) : > >> > > #> only defined on a data frame with all numeric variables > >> > > Same behaviour for min, max, any, all, ... . I believe this is > inconsistent with what these methods do for other empty objects (vectors, > matrices), where the return value is chosen to ensure transitivity: > sum(numeric(0)) == 0. > >> > > > >> > > The reason for this is that the return type of as.matrix() for > empty (no rows or no columns) data.frame objects is always a matrix of type > "logical". The Summary method for data.frame, in turn, throws an error when > the data.frame, converted to a matrix, is not of numeric type. > >> > > > >> > > I suggest two ways that make sum, min, max, ... more consistent. > IMHO it would be fitting to implement both of these fixes, because they also > make other things more consistent. > >> > > > >> > > 1. Make the return type of as.matrix() for zero-row data.frames > consistent with the type that would have been returned, had the data.frame > had more than zero rows. "as.matrix(data.frame(x = numeric(0)))" should then > be numeric, if there is an empty "character" column the return matrix should > be a character etc. This would make subsetting by row and conversion to > matrix commute (except for row names sometimes): > >> > >> all.equal(as.matrix(df[rows, , drop = FALSE]), > as.matrix(df)[rows, , drop = FALSE]) > >> > > Furthermore, this change would make as.matrix.data.frame obey the > documentation, which indicates that the coercion hierarchy is used for the > return type. > >> > > > >> > > 2. Make the Summary.data.frame method accept data.frames that > produce non-numeric matrices. Next to the main focus of this message, I > believe it would e.g. be fitting to have any() and all() work on logical > data.frame objects. The current behaviour is such that > >> > >> any(data.frame(x = 1)) > >> > > #> [1] TRUE > >> > > #> Warning message: > >> > > #> In any(1, na.rm = FALSE) : coercing argument of type 'double' > to logical > >> > > and > >> > >> any(data.frame(x = TRUE)) > >> > > #> Error in FUN(X[[i]], ...) : > >> > > #> only defined on a data frame with all numeric variables > >> > > So a numeric data.frame warns about implicit coercion, while a > logical data.frame (which would not need coercion) does not work at all. > >> > > > >> > > (I feel more strongly about fixing 1. than 2., because I don't > know the discussion that lead to the behaviour described in 2.) > >> > > > >> > > Best, > >> > > Martin > >> > > > >> > > ______________________________________________ > >> > > R-devel@r-project.org mailing list > >> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=q0b1qGN5IxjiKAeQYAkmEKNdqyTOXnuIAFtuPTiPli8&e= > >> > > >> > -- > >> > Peter Dalgaard, Professor, > >> > Center for Statistics, Copenhagen Business School > >> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > >> > Phone: (+45)38153501 > >> > Office: A 4.23 > >> > Email: pd....@cbs.dk Priv: pda...@gmail.com > >> > > >> > ______________________________________________ > >> > R-devel@r-project.org mailing list > >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=q0b1qGN5IxjiKAeQYAkmEKNdqyTOXnuIAFtuPTiPli8&e= > > > ______________________________________________ > > R-devel@r-project.org mailing list > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=q0b1qGN5IxjiKAeQYAkmEKNdqyTOXnuIAFtuPTiPli8&e= > > ______________________________________________ > R-devel@r-project.org mailing list > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=YAI4LgZvkD5k-tPHUGFX4PEjm72-6j_WxHpkdHfe_3Q&s=q0b1qGN5IxjiKAeQYAkmEKNdqyTOXnuIAFtuPTiPli8&e= > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel