On 28/06/2016 10:03 AM, William Dunlap wrote:
Currently exists("someName", where=someDataFrame) reports if
"someName" is an column
of the data.frame 'someDataFrame' and the 'where=' may be omitted. If
we have an
environment we use exsts("someName", envir=someEnvironment). It might
be nice to
continue using exists() instead of introducing a new function has(),
although, since we
want the same syntax to work for environments, data.frames, tbl_dfs,
data.tables, etc.,
we may need the new function.
One issue with exists("someName", someDataFrame) is that it's quite a
bit slower. (I think it converts the dataframe to an environment.) On
the other hand, getting the names from an environment requires more work
than checking for one, so exists("someName", someEnvironment) is faster
than checking for the name in the obvious way. The slow operations
could be sped up, but is that worth the effort?
The other issue with exists() is that it has a complicated definition
and hard to follow argument list (with args "where", "envir", "frame"
that all do related things); the thing I like about hasName() is that it
is very clear what it does. A criticism of it is that it is hardly any
shorter than just doing
name %in% names(x)
so is there really any point in making a function for this?
Duncan Murdoch
Bill Dunlap
TIBCO Software
wdunlap tibco.com <http://tibco.com>
On Tue, Jun 28, 2016 at 4:08 AM, Duncan Murdoch
<murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> wrote:
On 27/06/2016 10:15 PM, Lenth, Russell V wrote:
Hadley's note on partial matching has me scared the most
concerning the as.null() coding. So the need for a hasName()
(or whatever) function seems all the more compelling, and that
it be in base R. Perhaps it should be generic, with a default
method that searches in the names attribute, potentially
extensible to other classes.
I am thinking of putting it in, but if I do the definition will be
equivalent to the one-liner down below. That's already slower
than the is.null() test; making it generic would slow it down too
much.
Duncan Murdoch
Thanks so much, several of you, for your positive and helpful
responses.
Russ
-----Original Message-----
From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com
<mailto:murdoch.dun...@gmail.com>]
Sent: Monday, June 27, 2016 12:50 PM
To: Hadley Wickham <h.wick...@gmail.com
<mailto:h.wick...@gmail.com>>; Lenth, Russell V
<russell-le...@uiowa.edu <mailto:russell-le...@uiowa.edu>>
Cc: r-package-devel@r-project.org
<mailto:r-package-devel@r-project.org>
Subject: Re: [R-pkg-devel] Absent variables and tibble
On 27/06/2016 1:09 PM, Hadley Wickham wrote:
The other thing you need to be aware of it you're using
the other
approach is partial matching:
df <- data.frame(xyz = 1)
is.null(df$x)
#> [1] FALSE
Duncan - I think that argues for including a has_name()
(hasName() ?)
function in base R. Is that something you'd consider?
Yes, I'd consider it. I think hasName() would be more
consistent with other has*() functions in the R sources.
I guess the implementation should be defined to be equivalent to
hasName <- function(x, name)
name %in% names(x)
though it would make sense to make a faster internal
implementation;
!is.null(df$x) is quite a bit faster than "x" %in% names(df).
Duncan Murdoch
Hadley
On Mon, Jun 27, 2016 at 10:05 AM, Lenth, Russell V
<russell-le...@uiowa.edu <mailto:russell-le...@uiowa.edu>>
wrote:
Thanks, Hadley. I do understand why you'd want more
careful checking.
If you're going to provide a variable-existing
function, may I suggest a short name like 'has'? I.e.,
has(x, var) returns TRUE if x has var in it.
Thanks
Russ
On Jun 27, 2016, at 9:47 AM, Hadley Wickham
<h.wick...@gmail.com <mailto:h.wick...@gmail.com>>
wrote:
On Mon, Jun 27, 2016 at 9:03 AM, Duncan Murdoch
<murdoch.dun...@gmail.com
<mailto:murdoch.dun...@gmail.com>> wrote:
On 27/06/2016 9:22 AM, Lenth, Russell V wrote:
My package 'lsmeans' is now suddenly
broken because of a new
provision in the 'tibble' package (loaded
by 'dplyr' 0.5.0), whereby the "[[" and "$"
methods for 'tbl_df' objects - as
documented - throw an error if
a variable is not found.
The problem is that my code uses tests
like this:
if (is.null (x$var)) {...}
to see whether 'x' has a variable 'var'.
Obviously, I can work
around this using
if (!("var" %in% names(x))) {...}
but (a) I like the first version better,
in terms of the code
being understandable; and (b) isn't there
a long history whereby
we can expect a NULL result when accessing
an absent member of a
list (and hence a data.frame)? (c) the
code base for 'lsmeans'
has about 50 instances of such tests.
Anyway, I wonder if a lot of other package
developers test for
absent variables in that first way; if so,
they too are in for a
rude awakening if their users provide a
tbl_df instead of a
data.frame. And what is considered the
best practice for testing
absence of a list member? Apparently, not
either of the above;
and because of (c), I want to do these
many tedious corrections only once.
Thanks for any light you can shed.
This is why CRAN asks that people test reverse
dependencies.
Which we did do - the problem is that this is
actually caused by a
recursive reverse dependency (lsmeans -> dplyr ->
tibble), and we
didn't correctly anticipate how much pain this
would cause.
I think the most defensive thing you can do is
to write a small
function
name_missing <- function(x, name)
!(name %in% names(x))
and use name_missing(x, "var") in your tests.
(Pick your own name
to make your code understandable if you don't
like my choice.)
You could suggest to the tibble maintainers
that they add a
function like this.
We're definitely going to add this.
And I think we'll make df[["var"]] return NULL
too, so at least
there's one easy way to opt out.
The motivation for this change was that returning
NULL + recycling
rules means it's very easy for errors to silently
propagate. But I
think this approach might be somewhat too
aggressive - I hadn't
considered that people use `is.null()` to check
for missing columns.
We'll try and get an update to tibble out soon
after useR.
Thoughts on what we should do are greatly appreciated.
Hadley
--
http://hadley.nz
______________________________________________
R-package-devel@r-project.org
<mailto:R-package-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel
______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel