Re: [R-pkg-devel] active bindings in package namespace

2019-03-24 Thread Hong Ooi via R-package-devel
--- Begin Message ---
Don't want to turn this into a pile-on, but I also think this isn't a very good 
idea. As I understand it, accessing the symbol "foo" will pull the latest 
version of foo from the remote site. This has consequences for reproducibility, 
because now your code could be exactly the same, and your local environment 
exactly the same, and yet running the code at different times can yield 
different results because the remote data has been updated.


-Original Message-
From: R-package-devel  On Behalf Of Jack 
Wasey
Sent: Sunday, 24 March 2019 9:57 AM
To: Kirill Müller ; R Development 

Subject: Re: [R-pkg-devel] active bindings in package namespace

Thanks both, this is helpful advice.

On 3/23/19 5:14 PM, Kirill Müller wrote:
> Dear Jack
> 
> 
> This doesn't answer your question, but I would advise against this design.
> 
> - Users do not expect side effects (such as network access) from accessing a 
> symbol.
> 
> - A function gives you much more flexibility to change the interface 
> later on. (Arguments for fetching the data, tokens for API access, 
> ...)
> 
> - You already encountered a few quirks that make this an "interesting" 
> problem.
> 
> A function call only needs a pair of parentheses.
> 
> 
> Best regards
> 
> Kirill
> 
> 
> On 23.03.19 16:50, Jack O. Wasey wrote:
>> Dear all,
>>
>> I am developing a package which is a front for various online data (icd.data 
>> https://github.com/jackwasey/icd.data/ ). The current CRAN version just has 
>> lazy-loaded data, but now the package encompasses far more current and 
>> historic ICD codes from different countries, these can't be included in the 
>> CRAN package even with maximal compression.
>>
>> Other authors have solved this using functions to get the data, with or 
>> without a local cache of the retrieved data. No CRAN or other packages I 
>> have found after extensive searching use the attractive active binding 
>> feature of R.
>>
>> The goal is simple: for the user to refer to the data by its symbol, e.g., 
>> 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and 
>> parsed transparently (if the user has already granted permission, or after 
>> prompt if they haven't).
>>
>> The bindings are set using commands alongside the function definitions in 
>> R/*.R .E.g.
>>
>> makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, 
>> environment()) lockBinding("icd10cm_latest", environment())
>>
>> For non-interactive use, CI and CRAN tests, no data should be downloaded, 
>> and no cache directory set up without user consent. For interactive use, I 
>> ask permission to create a local data cache before downloading data.
>>
>> This works fine... until R CMD check. The following steps seems to 'get' or 
>> 'source' everything from the package namespace, which results in triggering 
>> the active bindings, and this fails if I am unable to get consent to 
>> download data, and want to 'stop' on this error condition.
>>  - checking dependencies in R code
>>  - checking S3 generic/method consistency
>>  - checking foreign function calls
>>  - checking R code for possible problems
>>
>> Debugging CI-specific binding bugs is a nightmare because these occur in 
>> different R sessions initiated by R CMD check.
>>
>> There may be legitimate reasons to evaluate everything in the 
>> namespace, but I've no idea what they are. Incidentally, Rstudio also 
>> does 'mget' on the whole package namespace and triggers bindings 
>> during autocomplete. https://github.com/rstudio/rstudio/issues/4414
>>
>> Is this something I should raise as an issue with R? Or does anyone have any 
>> idea of a sensible approach to this. Currently I have a set of workarounds, 
>> but this complicates the code, and has taken an awful lot of time. Does 
>> anyone know of any CRAN package which has active bindings in the package 
>> namespace?
>>
>> Any ideas appreciated.
>>
>> Jack Wasey
>>
>> __
>> R-package-devel@r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-package-devel
--- End Message ---
__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] active bindings in package namespace

2019-03-24 Thread Jack O. Wasey
This is a good point. I would prefer to include all the data in the 
package, but CRAN has strict limitations on package and subdirectory 
size, which the potential data would easily exceed. Whether it is an 
active binding or a get function, dynamically downloaded data will 
always suffer this problem. Also, there are potential copyright issues 
which may prevent including all the relevant data in a package, no 
matter how the package is distributed.


For this particular package of ICD data, the biggest risk is not the 
data changing, but the data not being made available in the future, or 
not being provided in a useful format.


I do allow the user to set the cache directory, which eventually 
includes all the raw and processed data, and this could be archived by 
the user for reproducibilty. In addition, the test suite covers 
potential changes to the source data.


On 3/24/19 11:21 AM, Hong Ooi wrote:

Don't want to turn this into a pile-on, but I also think this isn't a very good idea. As 
I understand it, accessing the symbol "foo" will pull the latest version of foo 
from the remote site. This has consequences for reproducibility, because now your code 
could be exactly the same, and your local environment exactly the same, and yet running 
the code at different times can yield different results because the remote data has been 
updated.


-Original Message-
From: R-package-devel  On Behalf Of Jack 
Wasey
Sent: Sunday, 24 March 2019 9:57 AM
To: Kirill Müller ; R Development 

Subject: Re: [R-pkg-devel] active bindings in package namespace

Thanks both, this is helpful advice.

On 3/23/19 5:14 PM, Kirill Müller wrote:

Dear Jack


This doesn't answer your question, but I would advise against this design.

- Users do not expect side effects (such as network access) from accessing a 
symbol.

- A function gives you much more flexibility to change the interface
later on. (Arguments for fetching the data, tokens for API access,
...)

- You already encountered a few quirks that make this an "interesting" problem.

A function call only needs a pair of parentheses.


Best regards

Kirill


On 23.03.19 16:50, Jack O. Wasey wrote:

Dear all,

I am developing a package which is a front for various online data (icd.data 
https://github.com/jackwasey/icd.data/ ). The current CRAN version just has 
lazy-loaded data, but now the package encompasses far more current and historic 
ICD codes from different countries, these can't be included in the CRAN package 
even with maximal compression.

Other authors have solved this using functions to get the data, with or without 
a local cache of the retrieved data. No CRAN or other packages I have found 
after extensive searching use the attractive active binding feature of R.

The goal is simple: for the user to refer to the data by its symbol, e.g., 
'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed 
transparently (if the user has already granted permission, or after prompt if 
they haven't).

The bindings are set using commands alongside the function definitions in R/*.R 
.E.g.

makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding,
environment()) lockBinding("icd10cm_latest", environment())

For non-interactive use, CI and CRAN tests, no data should be downloaded, and 
no cache directory set up without user consent. For interactive use, I ask 
permission to create a local data cache before downloading data.

This works fine... until R CMD check. The following steps seems to 'get' or 
'source' everything from the package namespace, which results in triggering the 
active bindings, and this fails if I am unable to get consent to download data, 
and want to 'stop' on this error condition.
  - checking dependencies in R code
  - checking S3 generic/method consistency
  - checking foreign function calls
  - checking R code for possible problems

Debugging CI-specific binding bugs is a nightmare because these occur in 
different R sessions initiated by R CMD check.

There may be legitimate reasons to evaluate everything in the
namespace, but I've no idea what they are. Incidentally, Rstudio also
does 'mget' on the whole package namespace and triggers bindings
during autocomplete. https://github.com/rstudio/rstudio/issues/4414

Is this something I should raise as an issue with R? Or does anyone have any 
idea of a sensible approach to this. Currently I have a set of workarounds, but 
this complicates the code, and has taken an awful lot of time. Does anyone know 
of any CRAN package which has active bindings in the package namespace?

Any ideas appreciated.

Jack Wasey

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-package-devel



__
R-package-devel@r-project.o

[R-pkg-devel] What to do when a dependency falls off CRAN

2019-03-24 Thread Neal Fultz
One of my clients has a shiny app which depends on RTextTools, which was
dropped from CRAN for lack of maintenance.

What would you all recommend in this situation? Here's a couple options I
could think of:

1) Vendor the orphaned package - we are doing this for now. I'm not a fan
of this, because then there's a mix of GPL-2, GPL-3, Apache 2.0 and
proprietary code all in one repo, and because it might encourage other
developers to write monolithic, non-modular code. At least when we find
bugs  we can fix them.

2) Install from CRAN archive instead of CRAN -  good for not having to
carry around third party code in our repo, but I'd expect this to break
with R 3.6, as the package hasn't rolled forward? Also no good way to fix
bugs.

3) Adopt package, push fixed one to CRAN - not sure what the exact process
is for un-orphaning, or if I would want to commit to maintaining it without
knowing more about why it was dropped and how much work it is to get it
passing. Eg if it were pathological solaris memory errors, I might have to
pass. Are there ways to see old automated CRAN checks on a package that was
abandoned? This approach obviously would benefit the community, but this is
probably not billable work.

4) Rewrite - I could do this, but it's probably tedious, weeks of work, and
my client may not want to pay for it; they also may not be interested in
sharing it back if they did.

5) Find another package - then I have to rewrite the "application" code
instead of the "library" code - also sounds tedious, days instead of weeks,
but more likely to be billable.

This topic has come up a few times in the past, but I would like to hear
your current opinions given that CRAN is much more rigorous and automated
now.

v/r

Neal Fultz

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] What to do when a dependency falls off CRAN

2019-03-24 Thread Dirk Eddelbuettel


Neal,

"It's complicated".  To a first appromimation, a dependency is a risk.

As an illustration, I taught CRANberries a few years in its run to also
consider disappearing packages.  Right now, it knows about 3685 packages
which are (or were at some point) "archived". This is an imprecise count as
some are "reborn", while some are special and have multiple archive /
readmitted / archive/ ... phases.  But right now, we have 3685/13957 or
26.4% which are / were archived.  Which is quite a lot.  Hence "a risk".

And just like other things in life you need to balance which risks are worth
taking and which are not.  Different people use different heuristics:
 - some trust certain packages more than others
 - some trust certain authors more than others
 - some trust certain communities more than others
There are no hard or fast rules.  Packages disappearing are a bit of pain,
but "we all" buy into CRAN maintaining quality standard for ... actually
enforcing them.

But as it is somewhat related, I now show for some/most of packages what
their count of dependecies is.  Count is another very imperfect measure, but
it provides a little bit on information at a glance. See [1] for more.

As for the package at hand: maybe importing the functionality you need would
work in the narrow sense. In the broader sense, adopting and maintaining the
package would surely be best for the community as a whole.

Dirk

[1] http://dirk.eddelbuettel.com/blog/2019/03/14#020_dependency_badges

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel