Kasper, here's how I deal with a largish data set (although data + code in one package for exactly that kind of circular dependency):
The data set is stored PCA-compressed (only the first few principal components) in matrices plus some meta information (vector, list, data.frame). I then have an internal function that reconstructs my example data: .make.chondro <- function (){ new ("hyperSpec", spc = (tcrossprod (.chondro.scores, .chondro.loadings) + rep (.chondro.center, each = nrow (.chondro.scores))), wavelength = .chondro.wl, data = .chondro.extra, labels = .chondro.labels) } The result of that function is assigned when the data is first used: delayedAssign ("chondro", .make.chondro ()) With that it should be possible to have the data package Suggests: the main package, while the main package Depends: on the data (though I did not yet find the time to separate both) Side note: the original raw data file (compressed ASCII) is available together with a variety of other raw data files from the project home page - interested users find download links in the vignettes and help pages. Best, Claudia Am Tue, 28 Jan 2014 21:21:20 -0500 schrieb Kasper Daniel Hansen <kasperdanielhan...@gmail.com>: > This is a great comment if the primary use of the data is to make the > data available. > > It is clear that a change in the internals of the class structure > requires changing the data package, and that is a clear drawback to my > recommendation. I have had to do this on several occasions. > > One issue with Herve's recommendation is when the same data structure > is used in several examples. In that case, the conversion / parsing > overhead multiplies by the number of examples. As an example, in > minfiData I have data on 6 samples on a somewhat large array. > Parsing the raw data files for 3 of the 6 files takes 16 secs (you > get this timing, because this is what I have in > example(read.450k.exp)). Loading all 6 arrays as an R data structure > takes 1.1 sec. > I would generally recommend that a data package either includes a > more raw form of the data or has a script which makes the data easily > retrievable. > > Best, > Kasper > > > On Tue, Jan 28, 2014 at 8:01 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > > > Hi Daniel, > > > > > > On 01/28/2014 03:49 PM, Daniel Kelley wrote: > > > >> I have an issue with a circular package dependence that prevents > >> building/checking, and I seek advice on breaking the circle so the > >> packages can pass the build-check tests that are required for CRAN > >> submission. > >> > >> The package pair I'm working with is slow to build, but my tests > >> suggest the issue may be general, and so I will explain it in > >> general terms. > >> > >> Suppose there are two packages: > >> > >> 1. Foo, a package that defines some data types with S4 classes. > >> > >> 2. Foodata, a package that provides such datasets, for use by Foo. > >> > >> With this setup, it seems reasonable that Foo "depends" on > >> Foodata, so the data can be used in Foo and its documentation. > >> > >> Since the data within Foodata are S4 classes as defined in Foo, an > >> attempt to build-check Foodata will produce an error unless Foo is > >> present. But Foo cannot be built unless Foodata exists, since it > >> depends on it. Thus neither Foo nor Foodata can be built and > >> checked. > >> > > > > I've learned by experience that it's generally better (although not > > always possible) to avoid putting serialized S4 objects in a data > > package. They will break if you need to modify a little bit the > > internals of the class (and chances are high that you will at some > > point). Better to store the data in a format that is more or less > > guaranteed to remain the same for years (SQLite, XML, hdf5, plain > > text, serialized data frame, SAM/BAM, etc...) and try to come up > > with a fast way to load and turn the data into an S4 object on > > demand. > > > > Not always possible if the data is huge... but for the purpose of > > using it in Foo's examples and vignette do you really need huge > > data? > > > > Another advantage of this approach is that the data can then be > > more easily shared because it can be accessed with tools other > > than yours, e.g. tools that don't know about S4 and even non-R > > tools. > > > > Cheers, > > H. > > > > > >> One solution would be to wrap the Foo documentation examples (and > >> relevant Foo code) in require() blocks, and to make Foo "suggest" > >> Foodata, not "depend" upon it. My question is whether this is the > >> recommended practice, or the common practice. > >> > >> Thanks in advance to anyone who wishes to offer hints. > >> > >> PS. The problem arose from an attempt to reduce CRAN load by > >> extracting the datasets that had been contained within a previous > >> version of Foo. > >> > >> PPS. my (slow-building) packages are on github and I can supply > >> details if needed. > >> > >> Dan E. Kelley > >> Professor, Oceanography Department > >> Dalhousie University, Canada > >> dan.kel...@dal.ca<mailto:dan.kel...@dal.ca> > >> > >> > >> > >> [[alternative HTML version deleted]] > >> > >> > >> > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > >> > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpa...@fhcrc.org > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > > > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > -- Claudia Beleites, Chemist Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.belei...@ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel