Re: [R-pkg-devel] active bindings in package namespace
--- Begin Message --- Don't want to turn this into a pile-on, but I also think this isn't a very good idea. As I understand it, accessing the symbol "foo" will pull the latest version of foo from the remote site. This has consequences for reproducibility, because now your code could be exactly the same, and your local environment exactly the same, and yet running the code at different times can yield different results because the remote data has been updated. -Original Message- From: R-package-devel On Behalf Of Jack Wasey Sent: Sunday, 24 March 2019 9:57 AM To: Kirill Müller ; R Development Subject: Re: [R-pkg-devel] active bindings in package namespace Thanks both, this is helpful advice. On 3/23/19 5:14 PM, Kirill Müller wrote: > Dear Jack > > > This doesn't answer your question, but I would advise against this design. > > - Users do not expect side effects (such as network access) from accessing a > symbol. > > - A function gives you much more flexibility to change the interface > later on. (Arguments for fetching the data, tokens for API access, > ...) > > - You already encountered a few quirks that make this an "interesting" > problem. > > A function call only needs a pair of parentheses. > > > Best regards > > Kirill > > > On 23.03.19 16:50, Jack O. Wasey wrote: >> Dear all, >> >> I am developing a package which is a front for various online data (icd.data >> https://github.com/jackwasey/icd.data/ ). The current CRAN version just has >> lazy-loaded data, but now the package encompasses far more current and >> historic ICD codes from different countries, these can't be included in the >> CRAN package even with maximal compression. >> >> Other authors have solved this using functions to get the data, with or >> without a local cache of the retrieved data. No CRAN or other packages I >> have found after extensive searching use the attractive active binding >> feature of R. >> >> The goal is simple: for the user to refer to the data by its symbol, e.g., >> 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and >> parsed transparently (if the user has already granted permission, or after >> prompt if they haven't). >> >> The bindings are set using commands alongside the function definitions in >> R/*.R .E.g. >> >> makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, >> environment()) lockBinding("icd10cm_latest", environment()) >> >> For non-interactive use, CI and CRAN tests, no data should be downloaded, >> and no cache directory set up without user consent. For interactive use, I >> ask permission to create a local data cache before downloading data. >> >> This works fine... until R CMD check. The following steps seems to 'get' or >> 'source' everything from the package namespace, which results in triggering >> the active bindings, and this fails if I am unable to get consent to >> download data, and want to 'stop' on this error condition. >> - checking dependencies in R code >> - checking S3 generic/method consistency >> - checking foreign function calls >> - checking R code for possible problems >> >> Debugging CI-specific binding bugs is a nightmare because these occur in >> different R sessions initiated by R CMD check. >> >> There may be legitimate reasons to evaluate everything in the >> namespace, but I've no idea what they are. Incidentally, Rstudio also >> does 'mget' on the whole package namespace and triggers bindings >> during autocomplete. https://github.com/rstudio/rstudio/issues/4414 >> >> Is this something I should raise as an issue with R? Or does anyone have any >> idea of a sensible approach to this. Currently I have a set of workarounds, >> but this complicates the code, and has taken an awful lot of time. Does >> anyone know of any CRAN package which has active bindings in the package >> namespace? >> >> Any ideas appreciated. >> >> Jack Wasey >> >> __ >> R-package-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel --- End Message --- __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] active bindings in package namespace
This is a good point. I would prefer to include all the data in the package, but CRAN has strict limitations on package and subdirectory size, which the potential data would easily exceed. Whether it is an active binding or a get function, dynamically downloaded data will always suffer this problem. Also, there are potential copyright issues which may prevent including all the relevant data in a package, no matter how the package is distributed. For this particular package of ICD data, the biggest risk is not the data changing, but the data not being made available in the future, or not being provided in a useful format. I do allow the user to set the cache directory, which eventually includes all the raw and processed data, and this could be archived by the user for reproducibilty. In addition, the test suite covers potential changes to the source data. On 3/24/19 11:21 AM, Hong Ooi wrote: Don't want to turn this into a pile-on, but I also think this isn't a very good idea. As I understand it, accessing the symbol "foo" will pull the latest version of foo from the remote site. This has consequences for reproducibility, because now your code could be exactly the same, and your local environment exactly the same, and yet running the code at different times can yield different results because the remote data has been updated. -Original Message- From: R-package-devel On Behalf Of Jack Wasey Sent: Sunday, 24 March 2019 9:57 AM To: Kirill Müller ; R Development Subject: Re: [R-pkg-devel] active bindings in package namespace Thanks both, this is helpful advice. On 3/23/19 5:14 PM, Kirill Müller wrote: Dear Jack This doesn't answer your question, but I would advise against this design. - Users do not expect side effects (such as network access) from accessing a symbol. - A function gives you much more flexibility to change the interface later on. (Arguments for fetching the data, tokens for API access, ...) - You already encountered a few quirks that make this an "interesting" problem. A function call only needs a pair of parentheses. Best regards Kirill On 23.03.19 16:50, Jack O. Wasey wrote: Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. - checking dependencies in R code - checking S3 generic/method consistency - checking foreign function calls - checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.o
[R-pkg-devel] What to do when a dependency falls off CRAN
One of my clients has a shiny app which depends on RTextTools, which was dropped from CRAN for lack of maintenance. What would you all recommend in this situation? Here's a couple options I could think of: 1) Vendor the orphaned package - we are doing this for now. I'm not a fan of this, because then there's a mix of GPL-2, GPL-3, Apache 2.0 and proprietary code all in one repo, and because it might encourage other developers to write monolithic, non-modular code. At least when we find bugs we can fix them. 2) Install from CRAN archive instead of CRAN - good for not having to carry around third party code in our repo, but I'd expect this to break with R 3.6, as the package hasn't rolled forward? Also no good way to fix bugs. 3) Adopt package, push fixed one to CRAN - not sure what the exact process is for un-orphaning, or if I would want to commit to maintaining it without knowing more about why it was dropped and how much work it is to get it passing. Eg if it were pathological solaris memory errors, I might have to pass. Are there ways to see old automated CRAN checks on a package that was abandoned? This approach obviously would benefit the community, but this is probably not billable work. 4) Rewrite - I could do this, but it's probably tedious, weeks of work, and my client may not want to pay for it; they also may not be interested in sharing it back if they did. 5) Find another package - then I have to rewrite the "application" code instead of the "library" code - also sounds tedious, days instead of weeks, but more likely to be billable. This topic has come up a few times in the past, but I would like to hear your current opinions given that CRAN is much more rigorous and automated now. v/r Neal Fultz [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] What to do when a dependency falls off CRAN
Neal, "It's complicated". To a first appromimation, a dependency is a risk. As an illustration, I taught CRANberries a few years in its run to also consider disappearing packages. Right now, it knows about 3685 packages which are (or were at some point) "archived". This is an imprecise count as some are "reborn", while some are special and have multiple archive / readmitted / archive/ ... phases. But right now, we have 3685/13957 or 26.4% which are / were archived. Which is quite a lot. Hence "a risk". And just like other things in life you need to balance which risks are worth taking and which are not. Different people use different heuristics: - some trust certain packages more than others - some trust certain authors more than others - some trust certain communities more than others There are no hard or fast rules. Packages disappearing are a bit of pain, but "we all" buy into CRAN maintaining quality standard for ... actually enforcing them. But as it is somewhat related, I now show for some/most of packages what their count of dependecies is. Count is another very imperfect measure, but it provides a little bit on information at a glance. See [1] for more. As for the package at hand: maybe importing the functionality you need would work in the narrow sense. In the broader sense, adopting and maintaining the package would surely be best for the community as a whole. Dirk [1] http://dirk.eddelbuettel.com/blog/2019/03/14#020_dependency_badges -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel