On 27/06/2016 10:15 PM, Lenth, Russell V wrote:
Hadley's note on partial matching has me scared the most concerning the
as.null() coding. So the need for a hasName() (or whatever) function seems all
the more compelling, and that it be in base R. Perhaps it should be generic,
with a default method that searches in the names attribute, potentially
extensible to other classes.
I am thinking of putting it in, but if I do the definition will be
equivalent to the one-liner down below. That's already slower than the
is.null() test; making it generic would slow it down too much.
Duncan Murdoch
Thanks so much, several of you, for your positive and helpful responses.
Russ
-----Original Message-----
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com]
Sent: Monday, June 27, 2016 12:50 PM
To: Hadley Wickham <h.wick...@gmail.com>; Lenth, Russell V
<russell-le...@uiowa.edu>
Cc: r-package-devel@r-project.org
Subject: Re: [R-pkg-devel] Absent variables and tibble
On 27/06/2016 1:09 PM, Hadley Wickham wrote:
The other thing you need to be aware of it you're using the other
approach is partial matching:
df <- data.frame(xyz = 1)
is.null(df$x)
#> [1] FALSE
Duncan - I think that argues for including a has_name() (hasName() ?)
function in base R. Is that something you'd consider?
Yes, I'd consider it. I think hasName() would be more consistent with other
has*() functions in the R sources.
I guess the implementation should be defined to be equivalent to
hasName <- function(x, name)
name %in% names(x)
though it would make sense to make a faster internal implementation;
!is.null(df$x) is quite a bit faster than "x" %in% names(df).
Duncan Murdoch
Hadley
On Mon, Jun 27, 2016 at 10:05 AM, Lenth, Russell V
<russell-le...@uiowa.edu> wrote:
Thanks, Hadley. I do understand why you'd want more careful checking.
If you're going to provide a variable-existing function, may I suggest a short
name like 'has'? I.e., has(x, var) returns TRUE if x has var in it.
Thanks
Russ
On Jun 27, 2016, at 9:47 AM, Hadley Wickham <h.wick...@gmail.com> wrote:
On Mon, Jun 27, 2016 at 9:03 AM, Duncan Murdoch
<murdoch.dun...@gmail.com> wrote:
On 27/06/2016 9:22 AM, Lenth, Russell V wrote:
My package 'lsmeans' is now suddenly broken because of a new
provision in the 'tibble' package (loaded by 'dplyr' 0.5.0), whereby the "[[" and
"$"
methods for 'tbl_df' objects - as documented - throw an error if
a variable is not found.
The problem is that my code uses tests like this:
if (is.null (x$var)) {...}
to see whether 'x' has a variable 'var'. Obviously, I can work
around this using
if (!("var" %in% names(x))) {...}
but (a) I like the first version better, in terms of the code
being understandable; and (b) isn't there a long history whereby
we can expect a NULL result when accessing an absent member of a
list (and hence a data.frame)? (c) the code base for 'lsmeans'
has about 50 instances of such tests.
Anyway, I wonder if a lot of other package developers test for
absent variables in that first way; if so, they too are in for a
rude awakening if their users provide a tbl_df instead of a
data.frame. And what is considered the best practice for testing
absence of a list member? Apparently, not either of the above;
and because of (c), I want to do these many tedious corrections only once.
Thanks for any light you can shed.
This is why CRAN asks that people test reverse dependencies.
Which we did do - the problem is that this is actually caused by a
recursive reverse dependency (lsmeans -> dplyr -> tibble), and we
didn't correctly anticipate how much pain this would cause.
I think the most defensive thing you can do is to write a small
function
name_missing <- function(x, name)
!(name %in% names(x))
and use name_missing(x, "var") in your tests. (Pick your own name
to make your code understandable if you don't like my choice.)
You could suggest to the tibble maintainers that they add a
function like this.
We're definitely going to add this.
And I think we'll make df[["var"]] return NULL too, so at least
there's one easy way to opt out.
The motivation for this change was that returning NULL + recycling
rules means it's very easy for errors to silently propagate. But I
think this approach might be somewhat too aggressive - I hadn't
considered that people use `is.null()` to check for missing columns.
We'll try and get an update to tibble out soon after useR.
Thoughts on what we should do are greatly appreciated.
Hadley
--
http://hadley.nz
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel