Jan T Kim <jtt...@googlemail.com> writes: > On Wed, Mar 16, 2016 at 03:18:27PM -0400, Duncan Murdoch wrote: >> On 16/03/2016 1:40 PM, Jan Kim wrote: >> >Barry: that's an interesting hack. >> > >> >I do feel compelled to make two comments, though, regarding the >> >general issue rather than the scraping idea: >> > >> >(1) If your situation is that that image (.RData file) is the only >> >copy of the data, you'll need to rescue the data from that as soon as >> >possible anyway. Something like >> > >> > load(".RData"); >> > write.csv(mydataframe, file = "mydata.csv"); >> > >> >should do this trick. It will be slow, but you'll need to do it just >> >once, so you might as well enjoy your coffee while you wait. From that >> >point on, work with the mydata.csv file for getting at the colnames >> >(and anything else as well). >> > >> >(2) If there's any chance / risk that scraping data off images is not >> >a one-off, the time to prevent that from catching on is now. If data is >> >of any value at all, it should be handled in a sane, portable, textual >> >format. For tabular data, csv is normally adequate or at least good >> >enough, but .RData images are never a good idea. >> >> I agree with the sentiment, but not with the choice of .csv as a >> "sane, portable, textual format". CSV has no type information >> included, so strings that contain only digits can turn into numbers >> (and get rounded in the process), things that look like >> dates can get converted to different formats, etc. > > I entirely agree. In hindsight, I should have stated that the .RData files, > as well as the R code to load and extract stuff from them, should be stored > permanently and documented. > >> The .RData format has the disadvantages of being hard to use outside >> R, but at least it is usable in R. > > yes -- that's why I thought it's a good idea to use R to pluck out the > valuable data, so (1) they can still be accessed even if the .RData > format changes and (2) they're in their own file, separated from the > (potentially homungous, see my P.S.) amount of other stuff caught up > in the image. > > But to reiterate, the .RData file should be secured as well if that's > the only remaining primary / original source of the data. > >> I don't know what I'd recommend if I wanted a portable textual >> format. JSON is close, but it can't handle the full >> range of data that R can handle (e.g. no Inf). dput() on a >> dataframe is text, but nothing but R can read it. > > yes, that's the problem with "JSON", it's a JavaScript but not really > an object notation, as it doesn't store class structure metadata. > > So again, the best bet is to secure multiple levels, the .RDdata > image to preserve the R types, the R script to be able to identify > the relevant variable(s), and the text version to avoid depending on > availablility of R / an R version still able to read the image format. > > Best regards, Jan
The package 'h5' provides an R interface to HDF5 files. I have used neither, but am aware that HDF5 is a widely used format for storing complex data structures. Would that be useful? Cheers, Loris [snip (99 lines)] -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.