Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Jari Oksanen
If you have to load two packages which both export the same name in their 
namespaces, namespace does not help in resolving which synonymous function to 
use. Neither does it help to have a package instead of a script as long as you 
end up loading two namespaces with name conflicts. The order of importing 
namespaces can also be difficult to control, because you may end up loading a 
namespace already when you start your R with a saved workspace. Moving a 
function to another package may be a transitional issue which disappears when 
both packages are at their final stages, but if you use the recommend 
deprecation stage, the same names can live together for a long time. So this 
package is a good idea, and preferably base R should be able to handle the 
issue of choosing between exported synonymous functions.

This has bitten me several times in package development, and with growing CRAN 
it is a growing problem. Package authors often have poor control of the issue, 
as they do not know what packages users use. Now we can only have a FAQ that 
tells that a certain error message does not come from a function in our 
package, but from some other package having a synonymous function that was used 
instead.

cheers, Jari Oksanen

On 23 Aug 2018, at 23:46 pm, Duncan Murdoch 
mailto:murdoch.dun...@gmail.com>> wrote:

First, some general comments:

This sounds like a useful package.

I would guess it has very little impact on runtime efficiency except when 
attaching a new package; have you checked that?

I am not so sure about your heuristics.  Can they be disabled, so the user is 
always forced to make the choice?  Even when a function is intended to adhere 
to the superset principle, they don't always get it right, so a really careful 
user should always do explicit disambiguation.

And of course, if users wrote most of their long scripts as packages instead of 
as long scripts, the ambiguity issue would arise far less often, because 
namespaces in packages are intended to solve the same problem as your package 
does.

One more comment inline about a typo, possibly in an error message.

Duncan Murdoch

On 23/08/2018 2:31 PM, Hadley Wickham wrote:
Hi all,
I’d love to get your feedback on the conflicted package, which provides an
alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I’m currently preparing a revision
(), and looking for feedback.
As you are no doubt aware, R’s default approach means that the most
recently loaded package “wins” any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:
-   People don’t read messages about conflicts. Even if you are
conscientious and do read the messages, it’s hard to notice a single
new conflict caused by a package upgrade.
-   The warning and the problem may be quite far apart. If you load all
your packages at the top of the script, it may potentially be 100s
of lines before you encounter a conflict.
-   The error messages caused by conflicts are cryptic because you end
up calling a function with utterly unexpected arguments.
For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:
library(conflicted)
library(dplyr)
library(MASS)
select
#> Error: [conflicted] `select` found in 2 packages.
#> Either pick the one you want with `::`
#> * MASS::select
#> * dplyr::select
#> Or declare a preference with `conflicted_prefer()`
#> * conflict_prefer("select", "MASS")
#> * conflict_prefer("select", "dplyr")

I don't know if this is a typo in your r-devel message or a typo in the error 
message, but you say `conflicted_prefer()` in one place and conflict_prefer() 
in the other.

conflicted works by attaching a new “conflicted” environment just after
the global environment. This environment contains an active binding for
any ambiguous bindings. The conflicted environment also contains
bindings for `library()` and `require()` that rebuild the conflicted
environemnt suppress default reporting (but are otherwise thin wrapeprs
around the base equivalents).
conflicted also provides a `conflict_scout()` helper which you can use
to see what’s going on:
conflict_scout(c("dplyr", "MASS"))
#> 1 conflict:
#> * `select`: dplyr, MASS
conflicted applies a few heuristics to minimise false positives (at the
cost of introducing a few false negatives). The overarching goal is to
ensure that code behaves identically regardless of the order in which
packages are attached.
-   A number of packages provide a function that appears to conflict
with a function in a base package, but they follow the superset
principle (i.e. they only extend the API, as explained to me by
Hervè Pa

Re: [Rd] Translating Rd files

2018-08-24 Thread Gábor Csárdi
On Wed, Aug 22, 2018 at 12:30 AM Duncan Murdoch
 wrote:
>
> On 21/08/2018 5:53 PM, Gábor Csárdi wrote:
> > Dear All,
> >
> > are there any resources (code, guidelines, anything) for translating Rd 
> > files?
> > As far as I can tell, the help system does not have support for this.
> > Have I missed something? Is such support desired?
>
> As of last year, support for this was seen as desirable but no action
> had been taken to put it in place.  The thinking was that it was too
> much work for the translators to have to follow every update to each
> help page, so the translated pages would likely soon be out of date.

I agree that this the biggest concern. In some cases maybe one can ensure
that the translated docs are still in sync. E.g. if the translated man page is
within the translated package. Then it is the package authors' responsibility
to keep it updated (or to remove it), and the regular package check tools can
be used as well.

This does make collaboration with "external" translators and package updates
somewhat more cumbersome, and it does not scale very well. But it still could
be a start. Most packages will not have a hundred translations any time soon.

If we allow translations in separate packages, that does make things more
complicated.

Gabor

> Duncan Murdoch
>
> >
> > I am thinking about a way to register manual pages in different
> > languages, and then`help()` would bring up the page with the preferred
> > language, defaulting to the current locale.
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Joris Meys
Dear Hadley,

There's been some mails from you lately about packages on R-devel. I would
argue that the appropriate list for that is R-pkg-devel, as I've been told
myself not too long ago. People might get confused and think this is about
a change to R itself, which it obviously is not.

Kind regards
Joris

On Thu, Aug 23, 2018 at 8:32 PM Hadley Wickham  wrote:

> Hi all,
>
> I’d love to get your feedback on the conflicted package, which provides an
> alternative strategy for resolving ambiugous function names (i.e. when
> multiple packages provide identically named functions). conflicted 0.1.0
> is already on CRAN, but I’m currently preparing a revision
> (), and looking for feedback.
>
> As you are no doubt aware, R’s default approach means that the most
> recently loaded package “wins” any conflicts. You do get a message about
> conflicts on load, but I see a lot newer R users experiencing problems
> caused by function conflicts. I think there are three primary reasons:
>
> -   People don’t read messages about conflicts. Even if you are
> conscientious and do read the messages, it’s hard to notice a single
> new conflict caused by a package upgrade.
>
> -   The warning and the problem may be quite far apart. If you load all
> your packages at the top of the script, it may potentially be 100s
> of lines before you encounter a conflict.
>
> -   The error messages caused by conflicts are cryptic because you end
> up calling a function with utterly unexpected arguments.
>
> For these reasons, conflicted takes an alternative approach, forcing the
> user to explicitly disambiguate any conflicts:
>
> library(conflicted)
> library(dplyr)
> library(MASS)
>
> select
> #> Error: [conflicted] `select` found in 2 packages.
> #> Either pick the one you want with `::`
> #> * MASS::select
> #> * dplyr::select
> #> Or declare a preference with `conflicted_prefer()`
> #> * conflict_prefer("select", "MASS")
> #> * conflict_prefer("select", "dplyr")
>
> conflicted works by attaching a new “conflicted” environment just after
> the global environment. This environment contains an active binding for
> any ambiguous bindings. The conflicted environment also contains
> bindings for `library()` and `require()` that rebuild the conflicted
> environemnt suppress default reporting (but are otherwise thin wrapeprs
> around the base equivalents).
>
> conflicted also provides a `conflict_scout()` helper which you can use
> to see what’s going on:
>
> conflict_scout(c("dplyr", "MASS"))
> #> 1 conflict:
> #> * `select`: dplyr, MASS
>
> conflicted applies a few heuristics to minimise false positives (at the
> cost of introducing a few false negatives). The overarching goal is to
> ensure that code behaves identically regardless of the order in which
> packages are attached.
>
> -   A number of packages provide a function that appears to conflict
> with a function in a base package, but they follow the superset
> principle (i.e. they only extend the API, as explained to me by
> Hervè Pages).
>
> conflicted assumes that packages adhere to the superset principle,
> which appears to be true in most of the cases that I’ve seen. For
> example, the lubridate package provides `as.difftime()` and `date()`
> which extend the behaviour of base functions, and provides S4
> generics for the set operators.
>
> conflict_scout(c("lubridate", "base"))
> #> 5 conflicts:
> #> * `as.difftime`: [lubridate]
> #> * `date`   : [lubridate]
> #> * `intersect`  : [lubridate]
> #> * `setdiff`: [lubridate]
> #> * `union`  : [lubridate]
>
> There are two popular functions that don’t adhere to this principle:
> `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
> special cases so they correctly generate conflicts. (I sure wish I’d
> know about the subset principle when creating dplyr!)
>
> conflict_scout(c("dplyr", "stats"))
> #> 2 conflicts:
> #> * `filter`: dplyr, stats
> #> * `lag`   : dplyr, stats
>
> -   Deprecated functions should never win a conflict, so conflicted
> checks for use of `.Deprecated()`. This rule is very useful when
> moving functions from one package to another. For example, many
> devtools functions were moved to usethis, and conflicted ensures
> that you always get the non-deprecated version, regardess of package
> attach order:
>
> head(conflict_scout(c("devtools", "usethis")))
> #> 26 conflicts:
> #> * `use_appveyor`   : [usethis]
> #> * `use_build_ignore`   : [usethis]
> #> * `use_code_of_conduct`: [usethis]
> #> * `use_coverage`   : [usethis]
> #> * `use_cran_badge` : [usethis]
> #> * `use_cran_comments`  : [usethis]
> #> ...
>
> Finally, as mentioned above, the user

Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Hadley Wickham
On Thu, Aug 23, 2018 at 3:46 PM Duncan Murdoch  wrote:
>
> First, some general comments:
>
> This sounds like a useful package.
>
> I would guess it has very little impact on runtime efficiency except
> when attaching a new package; have you checked that?

It adds one extra element to the search path, so the impact on speed
should be equivalent to loading one additional package (i.e.
negligible)

I've also done some benchmarking to see the impact on calls to
library(). These are now a little outdated (because I've added more
heuristics so I should re-do), but previously conflicted added about
100 ms overhead to a library() call when I had ~170 packages loaded
(the most I could load without running out of dlls).

> I am not so sure about your heuristics.  Can they be disabled, so the
> user is always forced to make the choice?  Even when a function is
> intended to adhere to the superset principle, they don't always get it
> right, so a really careful user should always do explicit disambiguation.

That is a good question - my intuition is always to start with less
user control as it makes it easier to get the core ideas right, and
it's easy to add more control later (whereas if you later take it
away, people get unhappy). Maybe it's natural to have a function that
does the opposite of conflict_prefer(), and declare that something
that doesn't appear to be a conflict actually is?

I don't think that an option to suppress the superset principle
altogether will work - my sense is that it will generate too many
false positives, to the point where you'll get frustrated and stop
using conflicted.

> And of course, if users wrote most of their long scripts as packages
> instead of as long scripts, the ambiguity issue would arise far less
> often, because namespaces in packages are intended to solve the same
> problem as your package does.

Agreed.

> One more comment inline about a typo, possibly in an error message.

Thanks for spotting; fixed in devel now.

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Hadley Wickham
On Fri, Aug 24, 2018 at 4:28 AM Joris Meys  wrote:
>
> Dear Hadley,
>
> There's been some mails from you lately about packages on R-devel. I would 
> argue that the appropriate list for that is R-pkg-devel, as I've been told 
> myself not too long ago. People might get confused and think this is about a 
> change to R itself, which it obviously is not.

The description for R-pkg-devel states:

> This list is to get help about package development in R. The goal of the list 
> is to provide a forum for learning about the package development process. We 
> hope to build a community of R package developers who can help each other 
> solve problems, and reduce some of the burden on the CRAN maintainers. If you 
> are having problems developing a package or passing R CMD check, this is the 
> place to ask!

The description for R-devel states:

> This list is intended for questions and discussion about code development in 
> R. Questions likely to prompt discussion unintelligible to non-programmers or 
> topics that are too technical for R-help's audience should go to R-devel, 
> unless they are specifically about problems in R package development where 
> the R-package-devel list is rather appropriate, see the posting guide 
> section. The main R mailing list is R-help.

My questions are not about how to develop a package, R CMD check, or
how to get it on CRAN, but instead about the semantics of the packages
I am working on. My opinion is supported by the fact that a number of
members of the R core team have responded (both on list and off) and
have not expressed concern about my choice of venue.

That said, I am happy to change venues (or simply not email at all) if
there is widespread concern that my emails are inappropriate.

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Joris Meys
On Fri, Aug 24, 2018 at 2:27 PM Hadley Wickham  wrote:

>
> My questions are not about how to develop a package, R CMD check, or
> how to get it on CRAN, but instead about the semantics of the packages
> I am working on. My opinion is supported by the fact that a number of
> members of the R core team have responded (both on list and off) and
> have not expressed concern about my choice of venue.
>

If those moderating the lists are fine with it, all good.

Cheers
Joris


> That said, I am happy to change venues (or simply not email at all) if
> there is widespread concern that my emails are inappropriate.
>
> Hadley
>
> --
> http://hadley.nz
>


-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)


---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] plotmath degree symbol

2018-08-24 Thread Edzer Pebesma
In plotmath expressions, R's degree symbol, e.g. shown by

plot(1, main = parse(text = "1*degree*C"))

has sunk to halfway the text line, instead of touching its top. In older
R versions this looked much better.
-- 
Edzer Pebesma
Institute for Geoinformatics
Heisenbergstrasse 2, 48151 Muenster, Germany
Phone: +49 251 8333081

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] True length - length(unclass(x)) - without having to call unclass()?

2018-08-24 Thread Henrik Bengtsson
Is there a low-level function that returns the length of an object 'x'
- the length that for instance .subset(x) and .subset2(x) see? An
obvious candidate would be to use:

.length <- function(x) length(unclass(x))

However, I'm concerned that calling unclass(x) may trigger an
expensive copy internally in some cases.  Is that concern unfounded?

Thxs,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Duncan Murdoch

On 24/08/2018 3:12 AM, Jari Oksanen wrote:
If you have to load two packages which both export the same name in 
their namespaces, namespace does not help in resolving which synonymous 
function to use. Neither does it help to have a package instead of a 
script as long as you end up loading two namespaces with name conflicts. 


You can't import the same name from two packages without getting an 
error message (at least when checking --as-cran, I'm not sure about 
vanilla checks), so this is already handled.


If you really only want one of the imports, then importing individual 
functions is the solution.  Don't import everything from the package. 
This is a good idea in any case.


If you want both of the imports, then there's the undocumented (?) 
ability to rename a function on import, as well as the documented 
possibility of using :: for one of them instead of importing it.


The order of importing namespaces can also be difficult to control, 
because you may end up loading a namespace already when you start your R 
with a saved workspace.


That doesn't make sense in the context of a package.  Packages import 
what they ask to import. The user's workspace is irrelevant to code 
within the package if it does its imports properly.  You can reference 
functions that are not imported, but you get a message when you run 
checks to tell you not to do that.


Duncan Murdoch

 Moving a function to another package may be a
transitional issue which disappears when both packages are at their 
final stages, but if you use the recommend deprecation stage, the same 
names can live together for a long time. So this package is a good idea, 
and preferably base R should be able to handle the issue of choosing 
between exported synonymous functions.


This has bitten me several times in package development, and with 
growing CRAN it is a growing problem. Package authors often have poor 
control of the issue, as they do not know what packages users use. Now 
we can only have a FAQ that tells that a certain error message does not 
come from a function in our package, but from some other package having 
a synonymous function that was used instead.


cheers, Jari Oksanen

On 23 Aug 2018, at 23:46 pm, Duncan Murdoch > wrote:


First, some general comments:

This sounds like a useful package.

I would guess it has very little impact on runtime efficiency except 
when attaching a new package; have you checked that?


I am not so sure about your heuristics.  Can they be disabled, so the 
user is always forced to make the choice?  Even when a function is 
intended to adhere to the superset principle, they don't always get it 
right, so a really careful user should always do explicit disambiguation.


And of course, if users wrote most of their long scripts as packages 
instead of as long scripts, the ambiguity issue would arise far less 
often, because namespaces in packages are intended to solve the same 
problem as your package does.


One more comment inline about a typo, possibly in an error message.

Duncan Murdoch

On 23/08/2018 2:31 PM, Hadley Wickham wrote:

Hi all,
I’d love to get your feedback on the conflicted package, which 
provides an

alternative strategy for resolving ambiugous function names (i.e. when
multiple packages provide identically named functions). conflicted 0.1.0
is already on CRAN, but I’m currently preparing a revision
(), and looking for feedback.
As you are no doubt aware, R’s default approach means that the most
recently loaded package “wins” any conflicts. You do get a message about
conflicts on load, but I see a lot newer R users experiencing problems
caused by function conflicts. I think there are three primary reasons:
-   People don’t read messages about conflicts. Even if you are
conscientious and do read the messages, it’s hard to notice a single
new conflict caused by a package upgrade.
-   The warning and the problem may be quite far apart. If you load all
your packages at the top of the script, it may potentially be 100s
of lines before you encounter a conflict.
-   The error messages caused by conflicts are cryptic because you end
up calling a function with utterly unexpected arguments.
For these reasons, conflicted takes an alternative approach, forcing the
user to explicitly disambiguate any conflicts:
library(conflicted)
library(dplyr)
library(MASS)
select
#> Error: [conflicted] `select` found in 2 packages.
#> Either pick the one you want with `::`
#> * MASS::select
#> * dplyr::select
#> Or declare a preference with `conflicted_prefer()`
#> * conflict_prefer("select", "MASS")
#> * conflict_prefer("select", "dplyr")


I don't know if this is a typo in your r-devel message or a typo in 
the error message, but you say `conflicted_prefer()` in one place and 
conflict_prefer() in the other.



conflicted works by attaching a new “conflicted” environme

Re: [Rd] conflicted: an alternative conflict resolution strategy

2018-08-24 Thread Gabe Becker
Hadley,

Overall seems like a cool and potentially really idea. I do have some
thoughts/feedback, which I've put in-line below

On Thu, Aug 23, 2018 at 11:31 AM, Hadley Wickham 
wrote:

>
> 
>

> conflicted applies a few heuristics to minimise false positives (at the
> cost of introducing a few false negatives). The overarching goal is to
> ensure that code behaves identically regardless of the order in which
> packages are attached.
>
> -   A number of packages provide a function that appears to conflict
> with a function in a base package, but they follow the superset
> principle (i.e. they only extend the API, as explained to me by
> Hervè Pages).
>
> conflicted assumes that packages adhere to the superset principle,
> which appears to be true in most of the cases that I’ve seen.


It seems that you may be able to strengthen this heuristic from a blanket
assumption to something more narrowly targeted by looking for one or more
of the following to confirm likely-superset adherence

   1. matching or purely extending formals (ie all the named arguments of
   base::fun match including order, and there are new arguments in pkg::fun
   only if base::fun takes ...)
   2. explicit call to  base::fun in the body of pkg::fun
   3. UseMethod(funname) and at least one provided S3 method calls base::fun
   4. S4 generic creation using fun or base::fun as the seeding/default
   method body or called from at least one method



> For
> example, the lubridate package provides `as.difftime()` and `date()`
> which extend the behaviour of base functions, and provides S4
> generics for the set operators.
>
> conflict_scout(c("lubridate", "base"))
> #> 5 conflicts:
> #> * `as.difftime`: [lubridate]
> #> * `date`   : [lubridate]
> #> * `intersect`  : [lubridate]
> #> * `setdiff`: [lubridate]
> #> * `union`  : [lubridate]
>
> There are two popular functions that don’t adhere to this principle:
> `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
> special cases so they correctly generate conflicts. (I sure wish I’d
> know about the subset principle when creating dplyr!)
>
> conflict_scout(c("dplyr", "stats"))
> #> 2 conflicts:
> #> * `filter`: dplyr, stats
> #> * `lag`   : dplyr, stats
>
> -   Deprecated functions should never win a conflict, so conflicted
> checks for use of `.Deprecated()`. This rule is very useful when
> moving functions from one package to another. For example, many
> devtools functions were moved to usethis, and conflicted ensures
> that you always get the non-deprecated version, regardess of package
> attach order:
>

I would completely believe this rule is useful for refactoring as you
describe, but that is the "same function" case. For an end-user in the
"different function same symbol" case it's not at all clear to me that the
deprecated function should always win.

People sometimes use deprecated functions. It's not great, and eventually
they'll need to fix that for any given case, but imagine if you deprecated
the filter verb in dplyr (I know this will never happen, but I think it's
illustrative none the less).

Consider a piece of code someone wrote before this hypothetical deprecation
of filter. The fact that it's now deprecated certainly doesn't mean that
they secretly wanted stats::filter all along, right? Conflicted acting as
if it does will lead to them getting the exact kind of error you're looking
to protect them from, and with even less ability to understand why because
they are already doing "The right thing" to protect themselves by using
conflicted in the first place...


> Finally, as mentioned above, the user can declare preferences:
>
> conflict_prefer("select", "MASS")
> #> [conflicted] Will prefer MASS::select over any other package
> conflict_scout(c("dplyr", "MASS"))
> #> 1 conflict:
> #> * `select`: [MASS]
>
>
I deeply worry about people putting this kind of thing, or even just
library(conflicted), in their .Rprofile and thus making their scripts
*substantially* less reproducible. Is that a consequence you have thought
about to this kind of functionality?

Best,
~G


> I’d love to hear what people think about the general idea, and if there
> are any obviously missing pieces.
>
> Thanks!
>
> Hadley
>
>
> --
> http://hadley.nz
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
Best,
~G

-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel