Re: [R-pkg-devel] Large data package
Dear All, Following Dirk's suggestion below, I have recently added a data package as a drat repository for my asteRisk package, placing it under Suggests in the main package. In order to keep the code tidy and know exactly when I’m accessing the data in the data package, I access all the data in the data package as asteRiskData:::Item This seems to be working fine, but upon CHECK I am getting the following NOTE: Unavailable namespace imported from by a ':::' call: ‘asteRiskData’ See the note in ?`:::` about the use of this operator. The mentioned Note says: It is typically a design mistake to use ::: in your code since the corresponding object has probably been kept internal for a good reason. Consider contacting the packagemaintainer if you feel the need to access the object for anything but mere inspection. Here I have decided by design to keep the objects internal in the data package, since they are only meant to be accessed by functions of the main package. I am wondering if anyone has had any experience with this NOTE before? Is it acceptable to leave it for submission of the updated version to CRAN? Thanks a lot in advance Best wishes, Rafa > El 28 abr 2021, a las 0:04, Dirk Eddelbuettel escribió: > > > *** > This email originates from outside Imperial. Do not click on links and > attachments unless you recognise the sender. > If you trust the sender, add them to your safe senders list > https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for > this address. > *** > > On 27 April 2021 at 10:26, Ayala Hernandez, Rafael wrote: > | I am in the process of including a large update for my package asteRisk, > that will require the usage of large data files (amounting in total to ~100 > MB). > | > | Given the CRAN package size limits of 5 MB, I am wondering what is the > preferred solution in these cases? I have read multiple possibilities, such > as requesting to CRAN to host a data-only package that would be updated very > infrequently, or hosting the data in another repository and providing > functions in the main package to retrieve the required files. > > In case you have not seen it yet, the R Journal article Brooke and I wrote a > few years ago covers exactly this use case, and walks through how to cover it > in a fairly detailed way. > > https://journal.r-project.org/archive/2017/RJ-2017-026/index.html > > @article{RJ-2017-026, > author = {G. Brooke Anderson and Dirk Eddelbuettel}, > title = {{Hosting Data Packages via drat: A Case Study with Hurricane > Exposure Data}}, > year = {2017}, > journal = {{The R Journal}}, > doi = {10.32614/RJ-2017-026}, > url = {https://doi.org/10.32614/RJ-2017-026}, > pages = {486--497}, > volume = {9}, > number = {1} > } > > Hope this helps, Dirk > > -- > https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Large data package
On 2 May 2021 at 10:12, Ayala Hernandez, Rafael wrote: | Following Dirk's suggestion below, I have recently added a data package as a drat repository for my asteRisk package, placing it under Suggests in the main package. | In order to keep the code tidy and know exactly when I’m accessing the data in the data package, I access all the data in the data package as asteRiskData:::Item Why would that be 'tidy'? Just use two colons as usual for things exported from your data package, and export everythng that your code package uses from it. The ':::' idiom is not to be used across package, ie don;t use in package B to access content from A. Which is what R CMD check is telling your here: "don't do this". Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Large data package
On 02/05/2021 8:44 a.m., Dirk Eddelbuettel wrote: On 2 May 2021 at 10:12, Ayala Hernandez, Rafael wrote: | Following Dirk's suggestion below, I have recently added a data package as a drat repository for my asteRisk package, placing it under Suggests in the main package. | In order to keep the code tidy and know exactly when I’m accessing the data in the data package, I access all the data in the data package as asteRiskData:::Item Why would that be 'tidy'? Just use two colons as usual for things exported from your data package, and export everythng that your code package uses from it. The ':::' idiom is not to be used across package, ie don;t use in package B to access content from A. Which is what R CMD check is telling your here: "don't do this". I wouldn't call it "tidy", but there are some possible reasons to do this. One may apply here: - You may not want other packages to depend on the data, because you would like to be able to change it without notice. Normally you'd do this by making it a private part of the main package, but if it's really big, that's discouraged. So the use described here may be reasonable. I can't spot it in the docs right now, but I believe CRAN will allow the use of ::: if the package it is importing from has the same maintainer as the main package. The problem here is that CRAN doesn't know who is the maintainer for asteRiskData. That package is not on CRAN, and they don't look on other repositories to figure it out. So the answer to Rafael's original question is that I think CRAN would agree to this use if you have a good reason for it, but you'll need to explain that reason in your submission message, and it will need manual intervention to ignore the automatic rejection. Following Dirk's advice is thus advisable (passing the auto checks is better than requiring manual intervention on every update), but not strictly necessary. Duncan Murdoch __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Large data package
Dear Dirk and Duncan, Thanks a lot for your clarifications. Both of your explanations make sense. Indeed, I would rather not have any other packages depend on the data package, at least not for the time being, in case I find good reasons to make modifications in the data package. But it makes sense to try to pass the auto checks as much as possible. I will work towards trying to avoid ::: Best wishes, Rafa > El 2 may 2021, a las 16:29, Duncan Murdoch > escribió: > > On 02/05/2021 8:44 a.m., Dirk Eddelbuettel wrote: >> On 2 May 2021 at 10:12, Ayala Hernandez, Rafael wrote: >> | Following Dirk's suggestion below, I have recently added a data package as >> a drat repository for my asteRisk package, placing it under Suggests in the >> main package. >> | In order to keep the code tidy and know exactly when I’m accessing the >> data in the data package, I access all the data in the data package as >> asteRiskData:::Item >> Why would that be 'tidy'? >> Just use two colons as usual for things exported from your data package, and >> export everythng that your code package uses from it. The ':::' idiom is not >> to be used across package, ie don;t use in package B to access content from >> A. Which is what R CMD check is telling your here: "don't do this". > > I wouldn't call it "tidy", but there are some possible reasons to do this. > One may apply here: > > - You may not want other packages to depend on the data, because you would > like to be able to change it without notice. Normally you'd do this by > making it a private part of the main package, but if it's really big, that's > discouraged. So the use described here may be reasonable. > > I can't spot it in the docs right now, but I believe CRAN will allow the use > of ::: if the package it is importing from has the same maintainer as the > main package. > > The problem here is that CRAN doesn't know who is the maintainer for > asteRiskData. That package is not on CRAN, and they don't look on other > repositories to figure it out. > > So the answer to Rafael's original question is that I think CRAN would agree > to this use if you have a good reason for it, but you'll need to explain that > reason in your submission message, and it will need manual intervention to > ignore the automatic rejection. > > Following Dirk's advice is thus advisable (passing the auto checks is better > than requiring manual intervention on every update), but not strictly > necessary. > > Duncan Murdoch > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Large data package
On 2 May 2021 at 15:00, Ayala Hernandez, Rafael wrote: | Thanks a lot for your clarifications. Both of your explanations make sense. Indeed, I would rather not have any other packages depend on the data package, at least not for the time being, in case I find good reasons to make modifications in the data package. | | But it makes sense to try to pass the auto checks as much as possible. I will work towards trying to avoid ::: I fear you are still looking at the wrong windmill, longing for a fight. There is _nothing_ wrong with a :: for package you have a Suggests: on, and having a _conditional dependence_ on a large data package is where we started. You could just use :: in your package, as long as it is inside blocks of the form if (requireNamespace(nameOfDataPackage, quietly=TRUE)) { ... } or if you really dislike ::, use a library(nameOfDataPackage). The key is to use the conditional dependence via such a test in the code. And the CRAN Repo Policy and Writing R Extensions are moving ever so slowly in this direction. It is not something that can be enforced yet given both the number of packages still doing it wrong, and the number of people who continue repeating that this is a good or acceptable practice. We can and will do better, just how we moved to using NAMESPACES for a reason. Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Large data package
Dear Dirk, Thanks a lot for your detailed explanation. This is indeed the solution that I have found to work nicely, and indeed keeps the code equally tidy. In fact, I am finding that using :: to access exported objects instead of ::: to access any object in the data package even helps me organize the code better, since I have a clear record of the interaction between the 2 packages (in the form of exports in the NAMESPACE of the Data package). Together with conditional checks to ensure that the data package is available before running code that requires it, it seems to be the best solution to me. Best wishes, Rafa > El 2 may 2021, a las 17:27, Dirk Eddelbuettel escribió: > > > On 2 May 2021 at 15:00, Ayala Hernandez, Rafael wrote: > | Thanks a lot for your clarifications. Both of your explanations make sense. > Indeed, I would rather not have any other packages depend on the data > package, at least not for the time being, in case I find good reasons to make > modifications in the data package. > | > | But it makes sense to try to pass the auto checks as much as possible. I > will work towards trying to avoid ::: > > I fear you are still looking at the wrong windmill, longing for a fight. > > There is _nothing_ wrong with a :: for package you have a Suggests: on, and > having a _conditional dependence_ on a large data package is where we > started. You could just use :: in your package, as long as it is inside > blocks of the form > if (requireNamespace(nameOfDataPackage, quietly=TRUE)) { ... } > or if you really dislike ::, use a library(nameOfDataPackage). > > The key is to use the conditional dependence via such a test in the code. > > And the CRAN Repo Policy and Writing R Extensions are moving ever so slowly > in this direction. It is not something that can be enforced yet given both > the number of packages still doing it wrong, and the number of people who > continue repeating that this is a good or acceptable practice. We can and > will do better, just how we moved to using NAMESPACES for a reason. > > Dirk > > -- > https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > > __ > R-package-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel