Hadley's note on partial matching has me scared the most concerning the as.null() coding. So the need for a hasName() (or whatever) function seems all the more compelling, and that it be in base R. Perhaps it should be generic, with a default method that searches in the names attribute, potentially extensible to other classes.
Thanks so much, several of you, for your positive and helpful responses. Russ -----Original Message----- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Monday, June 27, 2016 12:50 PM To: Hadley Wickham <h.wick...@gmail.com>; Lenth, Russell V <russell-le...@uiowa.edu> Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] Absent variables and tibble On 27/06/2016 1:09 PM, Hadley Wickham wrote: > The other thing you need to be aware of it you're using the other > approach is partial matching: > > df <- data.frame(xyz = 1) > is.null(df$x) > #> [1] FALSE > > Duncan - I think that argues for including a has_name() (hasName() ?) > function in base R. Is that something you'd consider? Yes, I'd consider it. I think hasName() would be more consistent with other has*() functions in the R sources. I guess the implementation should be defined to be equivalent to hasName <- function(x, name) name %in% names(x) though it would make sense to make a faster internal implementation; !is.null(df$x) is quite a bit faster than "x" %in% names(df). Duncan Murdoch > > Hadley > > On Mon, Jun 27, 2016 at 10:05 AM, Lenth, Russell V > <russell-le...@uiowa.edu> wrote: > > Thanks, Hadley. I do understand why you'd want more careful checking. > > > > If you're going to provide a variable-existing function, may I suggest a > > short name like 'has'? I.e., has(x, var) returns TRUE if x has var in it. > > > > Thanks > > > > Russ > > > >> On Jun 27, 2016, at 9:47 AM, Hadley Wickham <h.wick...@gmail.com> wrote: > >> > >> On Mon, Jun 27, 2016 at 9:03 AM, Duncan Murdoch > >> <murdoch.dun...@gmail.com> wrote: > >>> On 27/06/2016 9:22 AM, Lenth, Russell V wrote: > >>>> > >>>> My package 'lsmeans' is now suddenly broken because of a new > >>>> provision in the 'tibble' package (loaded by 'dplyr' 0.5.0), whereby the > >>>> "[[" and "$" > >>>> methods for 'tbl_df' objects - as documented - throw an error if > >>>> a variable is not found. > >>>> > >>>> The problem is that my code uses tests like this: > >>>> > >>>> if (is.null (x$var)) {...} > >>>> > >>>> to see whether 'x' has a variable 'var'. Obviously, I can work > >>>> around this using > >>>> > >>>> if (!("var" %in% names(x))) {...} > >>>> > >>>> but (a) I like the first version better, in terms of the code > >>>> being understandable; and (b) isn't there a long history whereby > >>>> we can expect a NULL result when accessing an absent member of a > >>>> list (and hence a data.frame)? (c) the code base for 'lsmeans' > >>>> has about 50 instances of such tests. > >>>> > >>>> Anyway, I wonder if a lot of other package developers test for > >>>> absent variables in that first way; if so, they too are in for a > >>>> rude awakening if their users provide a tbl_df instead of a > >>>> data.frame. And what is considered the best practice for testing > >>>> absence of a list member? Apparently, not either of the above; > >>>> and because of (c), I want to do these many tedious corrections only > >>>> once. > >>>> > >>>> Thanks for any light you can shed. > >>> > >>> > >>> This is why CRAN asks that people test reverse dependencies. > >> > >> Which we did do - the problem is that this is actually caused by a > >> recursive reverse dependency (lsmeans -> dplyr -> tibble), and we > >> didn't correctly anticipate how much pain this would cause. > >> > >>> I think the most defensive thing you can do is to write a small > >>> function > >>> > >>> name_missing <- function(x, name) > >>> !(name %in% names(x)) > >>> > >>> and use name_missing(x, "var") in your tests. (Pick your own name > >>> to make your code understandable if you don't like my choice.) > >>> > >>> You could suggest to the tibble maintainers that they add a > >>> function like this. > >> > >> We're definitely going to add this. > >> > >> And I think we'll make df[["var"]] return NULL too, so at least > >> there's one easy way to opt out. > >> > >> The motivation for this change was that returning NULL + recycling > >> rules means it's very easy for errors to silently propagate. But I > >> think this approach might be somewhat too aggressive - I hadn't > >> considered that people use `is.null()` to check for missing columns. > >> > >> We'll try and get an update to tibble out soon after useR. > >> Thoughts on what we should do are greatly appreciated. > >> > >> Hadley > >> > >> -- > >> http://hadley.nz > > > ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel