Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-04-05 Thread Sean Davis
, February 15, 2025 at 10:29 AM To: Simon Urbanek Cc: R-package-devel@r-project.org Subject: Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package On 15 February 2025 at 19:50, Simon Urbanek wrote: | Github is not reliable enough for reproducible research (your files can

Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-17 Thread Dirk Eddelbuettel
Hi Thierry, On 17 February 2025 at 09:16, Thierry Onkelinx wrote: | Zenodo does offer storage. The default quota are 50GB and 100 files per record | (version). See https://help.zenodo.org/docs/deposit/manage-files/#prepare So TIL! Thanks for the heads-up and correction. Dirk -- dirk.eddelbue

Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-17 Thread Thierry Onkelinx
Dear Dirk, Zenodo does offer storage. The default quota are 50GB and 100 files per record (version). See https://help.zenodo.org/docs/deposit/manage-files/#prepare Best regards, Thierry ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NA

Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-15 Thread Dirk Eddelbuettel
On 15 February 2025 at 19:50, Simon Urbanek wrote: | Github is not reliable enough for reproducible research (your files can | disappear at any point - or can change without notice), I'm curious: Do you have a concrete example of a no-longer-reproducible study whose data or other support files

Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-14 Thread Simon Urbanek
I would like to second the Zenodo recommendation. Github is not reliable enough for reproducible research (your files can disappear at any point - or can change without notice), that's why Zenodo was created. It assumes that your package has the list of DOIs to offer, but that should be ideally

Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-14 Thread Thierry Onkelinx
Dear John, Our workflow for an open and reproducible workflow is to publish the data via Zenodo. https://zenodo.org/ is maintained by CERN. - The data is freely available. - Your data is easy to cite. - Every version gets its own DOI + one stable DOI that always points to the most recent version.

Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-14 Thread Jeff Newmiller via R-package-devel
Seconded... have the support for obtaining the desired file be completely initiated by the user, and explicitly pass the filename into the functions that use the data. It is also easier to trace which file was used in a past analysis this way... auto config seems convenient, but it is hard to re

Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-14 Thread Jan van der Laan
Not an answer, but a request from someone often working behind firewalls and/or machines not connected to the internet. Please have a way to have the package search for the data at some user specified location such as a local directory. Best, Jan On 14-02-2025 15:54, John Clarke wrote:

Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-14 Thread John Clarke
Thanks so much Rafael, I think piggyback is exactly what I was looking for. I wonder if it is possible/best practice to include a call to it during the install.packages('MyPackage') process so that the data is available prior to running tests in the R CMD build Github Action (and also for users to

Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-14 Thread Rafael H. M. Pereira
Hi John, There are different alternatives on where to host the data (e.g. OSF, a proprietary server, Github etc). The solution I've been adopting in most of my packages is to use a combination of a proprietary server and Github. So the data is first downloaded from our own server and only if our

[R-pkg-devel] Retrieving versioned csv datasets for use in an R package

2025-02-14 Thread John Clarke
Hi folks, I've looked around for this particular question, but haven't found a good answer. I have a versioned dataset that includes about 6 csv files that total about 15MB for each version. The versions get updated every few years or so and are used to drive the model which was written in C++ but