[Rd] Slightly misleading help page: tools::add_datalist

2018-08-23 Thread Colin Gillespie
Hi,

I was looking at the help page for add_datalist() and it states under
Details:

"R CMD build will call this function to add a data list to packages with
1MB or more of data."

This is correct, however, what is omitted from the help page is that
add_datalist() will *only* create the datalist file on packages with 1MB or
more of data, as in the function we have

if (size <= 1024^2) return()

As an aside what's the reason for the 1MB limit?

Thanks

Colin

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] conflicted: an alternative conflict resolution strategy

2018-08-23 Thread Hadley Wickham
Hi all,

I’d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I’m currently preparing a revision
(), and looking for feedback.

As you are no doubt aware, R’s default approach means that the most
recently loaded package “wins” any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:

-   People don’t read messages about conflicts. Even if you are
conscientious and do read the messages, it’s hard to notice a single
new conflict caused by a package upgrade.

-   The warning and the problem may be quite far apart. If you load all
your packages at the top of the script, it may potentially be 100s
of lines before you encounter a conflict.

-   The error messages caused by conflicts are cryptic because you end
up calling a function with utterly unexpected arguments.

For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:

library(conflicted)
library(dplyr)
library(MASS)

select
#> Error: [conflicted] `select` found in 2 packages.
#> Either pick the one you want with `::`
#> * MASS::select
#> * dplyr::select
#> Or declare a preference with `conflicted_prefer()`
#> * conflict_prefer("select", "MASS")
#> * conflict_prefer("select", "dplyr")

conflicted works by attaching a new “conflicted” environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).

conflicted also provides a `conflict_scout()` helper which you can use
to see what’s going on:

conflict_scout(c("dplyr", "MASS"))
#> 1 conflict:
#> * `select`: dplyr, MASS

conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.

-   A number of packages provide a function that appears to conflict
with a function in a base package, but they follow the superset
principle (i.e. they only extend the API, as explained to me by
Hervè Pages).

conflicted assumes that packages adhere to the superset principle,
which appears to be true in most of the cases that I’ve seen. For
example, the lubridate package provides `as.difftime()` and `date()`
which extend the behaviour of base functions, and provides S4
generics for the set operators.

conflict_scout(c("lubridate", "base"))
#> 5 conflicts:
#> * `as.difftime`: [lubridate]
#> * `date`   : [lubridate]
#> * `intersect`  : [lubridate]
#> * `setdiff`: [lubridate]
#> * `union`  : [lubridate]

There are two popular functions that don’t adhere to this principle:
`dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
special cases so they correctly generate conflicts. (I sure wish I’d
know about the subset principle when creating dplyr!)

conflict_scout(c("dplyr", "stats"))
#> 2 conflicts:
#> * `filter`: dplyr, stats
#> * `lag`   : dplyr, stats

-   Deprecated functions should never win a conflict, so conflicted
checks for use of `.Deprecated()`. This rule is very useful when
moving functions from one package to another. For example, many
devtools functions were moved to usethis, and conflicted ensures
that you always get the non-deprecated version, regardess of package
attach order:

head(conflict_scout(c("devtools", "usethis")))
#> 26 conflicts:
#> * `use_appveyor`   : [usethis]
#> * `use_build_ignore`   : [usethis]
#> * `use_code_of_conduct`: [usethis]
#> * `use_coverage`   : [usethis]
#> * `use_cran_badge` : [usethis]
#> * `use_cran_comments`  : [usethis]
#> ...

Finally, as mentioned above, the user can declare preferences:

conflict_prefer("select", "MASS")
#> [conflicted] Will prefer MASS::select over any other package
conflict_scout(c("dplyr", "MASS"))
#> 1 conflict:
#> * `select`: [MASS]

I’d love to hear what people think about the general idea, and if there
are any obviously missing pieces.

Thanks!

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-23 Thread Duncan Murdoch

First, some general comments:

This sounds like a useful package.

I would guess it has very little impact on runtime efficiency except 
when attaching a new package; have you checked that?


I am not so sure about your heuristics.  Can they be disabled, so the 
user is always forced to make the choice?  Even when a function is 
intended to adhere to the superset principle, they don't always get it 
right, so a really careful user should always do explicit disambiguation.


And of course, if users wrote most of their long scripts as packages 
instead of as long scripts, the ambiguity issue would arise far less 
often, because namespaces in packages are intended to solve the same 
problem as your package does.


One more comment inline about a typo, possibly in an error message.

Duncan Murdoch

On 23/08/2018 2:31 PM, Hadley Wickham wrote:

Hi all,

I’d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I’m currently preparing a revision
(), and looking for feedback.

As you are no doubt aware, R’s default approach means that the most
recently loaded package “wins” any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:

-   People don’t read messages about conflicts. Even if you are
 conscientious and do read the messages, it’s hard to notice a single
 new conflict caused by a package upgrade.

-   The warning and the problem may be quite far apart. If you load all
 your packages at the top of the script, it may potentially be 100s
 of lines before you encounter a conflict.

-   The error messages caused by conflicts are cryptic because you end
 up calling a function with utterly unexpected arguments.

For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:

 library(conflicted)
 library(dplyr)
 library(MASS)

 select
 #> Error: [conflicted] `select` found in 2 packages.
 #> Either pick the one you want with `::`
 #> * MASS::select
 #> * dplyr::select
 #> Or declare a preference with `conflicted_prefer()`
 #> * conflict_prefer("select", "MASS")
 #> * conflict_prefer("select", "dplyr")


I don't know if this is a typo in your r-devel message or a typo in the 
error message, but you say `conflicted_prefer()` in one place and 
conflict_prefer() in the other.




conflicted works by attaching a new “conflicted” environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).

conflicted also provides a `conflict_scout()` helper which you can use
to see what’s going on:

 conflict_scout(c("dplyr", "MASS"))
 #> 1 conflict:
 #> * `select`: dplyr, MASS

conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.

-   A number of packages provide a function that appears to conflict
 with a function in a base package, but they follow the superset
 principle (i.e. they only extend the API, as explained to me by
 Hervè Pages).

 conflicted assumes that packages adhere to the superset principle,
 which appears to be true in most of the cases that I’ve seen. For
 example, the lubridate package provides `as.difftime()` and `date()`
 which extend the behaviour of base functions, and provides S4
 generics for the set operators.

 conflict_scout(c("lubridate", "base"))
 #> 5 conflicts:
 #> * `as.difftime`: [lubridate]
 #> * `date`   : [lubridate]
 #> * `intersect`  : [lubridate]
 #> * `setdiff`: [lubridate]
 #> * `union`  : [lubridate]

 There are two popular functions that don’t adhere to this principle:
 `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
 special cases so they correctly generate conflicts. (I sure wish I’d
 know about the subset principle when creating dplyr!)

 conflict_scout(c("dplyr", "stats"))
 #> 2 conflicts:
 #> * `filter`: dplyr, stats
 #> * `lag`   : dplyr, stats

-   Deprecated functions should never win a conflict, so conflicted
 checks for use of `.Deprecated()`. This rule is very useful when
 moving functions from one package to another. For example, many
 devtools functions were moved to usethis, and co