, February 15, 2025 at 10:29 AM
To: Simon Urbanek
Cc: R-package-devel@r-project.org
Subject: Re: [R-pkg-devel] Retrieving versioned csv datasets for use in an R
package
On 15 February 2025 at 19:50, Simon Urbanek wrote:
| Github is not reliable enough for reproducible research (your files can
Hi Thierry,
On 17 February 2025 at 09:16, Thierry Onkelinx wrote:
| Zenodo does offer storage. The default quota are 50GB and 100 files per record
| (version). See https://help.zenodo.org/docs/deposit/manage-files/#prepare
So TIL! Thanks for the heads-up and correction.
Dirk
--
dirk.eddelbue
Dear Dirk,
Zenodo does offer storage. The default quota are 50GB and 100 files per
record (version). See
https://help.zenodo.org/docs/deposit/manage-files/#prepare
Best regards,
Thierry
ir. Thierry Onkelinx
Statisticus / Statistician
Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NA
On 15 February 2025 at 19:50, Simon Urbanek wrote:
| Github is not reliable enough for reproducible research (your files can
| disappear at any point - or can change without notice),
I'm curious: Do you have a concrete example of a no-longer-reproducible study
whose data or other support files
I would like to second the Zenodo recommendation. Github is not reliable enough
for reproducible research (your files can disappear at any point - or can
change without notice), that's why Zenodo was created. It assumes that your
package has the list of DOIs to offer, but that should be ideally
Dear John,
Our workflow for an open and reproducible workflow is to publish the data
via Zenodo. https://zenodo.org/ is maintained by CERN.
- The data is freely available.
- Your data is easy to cite.
- Every version gets its own DOI + one stable DOI that always points to the
most recent version.
Seconded... have the support for obtaining the desired file be completely
initiated by the user, and explicitly pass the filename into the functions that
use the data. It is also easier to trace which file was used in a past analysis
this way... auto config seems convenient, but it is hard to re
Not an answer, but a request from someone often working behind firewalls
and/or machines not connected to the internet. Please have a way to have
the package search for the data at some user specified location such as
a local directory.
Best,
Jan
On 14-02-2025 15:54, John Clarke wrote:
Thanks so much Rafael, I think piggyback is exactly what I was looking for.
I wonder if it is possible/best practice to include a call to it during the
install.packages('MyPackage') process so that the data is available prior
to running tests in the R CMD build Github Action (and also for users to
Hi John,
There are different alternatives on where to host the data (e.g. OSF, a
proprietary server, Github etc). The solution I've been adopting in most of
my packages is to use a combination of a proprietary server and Github.
So the data is first downloaded from our own server and only if our
Hi folks,
I've looked around for this particular question, but haven't found a good
answer. I have a versioned dataset that includes about 6 csv files that
total about 15MB for each version. The versions get updated every few years
or so and are used to drive the model which was written in C++ but
11 matches
Mail list logo