Hello,
I have two questions about creating data packages for data that will be updated and in total are >5 MB in size. The first question is: In the CRAN policy, it indicates that packages should be ?5 MB in size in general. Within a package that I'm working on, I need access to data that are updated approximately quarterly, including the historical datasets (specifically, these are the SDTM and CDASH terminologies in https://evs.nci.nih.gov/ftp1/CDISC/SDTM/Archive/). Current individual data updates are approximately 1 MB when individually saved as .RDS, and the total current set is about 20 MB. I think that the preferred way to generate these packages since there will be future updates is to generate one data package for each update and then have an umbrella package that will depend on each of the individual data update packages. That seems like it will minimize space requirements on CRAN since old data will probably never need to be updated (though I will need to access it). Is that an accurate summary of the best practice for creating these as a data package? And a second question is: Assuming the best practice is the one I described above, the typical need will be to combine the individual historical datasets for local use. An initial test of the time to combine the data indicates that it would take about 1 minute to do, but after combination, the result could be loaded faster. I'd like to store the combined dataset locally with the umbrella package. I believe that it is considered poor form to write within the library location for a package except during installation. What is the best practice for caching the resulting large dataset which is locally-generated? Thanks, Bill [[alternative HTML version deleted]] ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel