Le Thu, Jan 07, 2010 at 09:51:05PM +0100, Joerg Jaspert a écrit : > > >> more than ASCII format, what we need is the preferred form for making > >> modifications. Binary format by itself is not a problem since there is no > >> loss > >> of information between both formats. I am not against including a text > >> dump of > >> the R object, but I would like to make clear that if this becomes a > >> requirement > >> for R packages to enter in Debian, then many packages from the gnu-r > >> section > >> are probably RC-buggy… > > > I would like to know your conclusion on *Rdata files. They are example data > > files for the documentation and the regression tests. Many r-cran-* packages > > contain them. My personal opinion is that since they can be read, written, > > modified, and exported with R, they are a ‘preferential form’ for > > modification. > > > I am currently holding my work on the r-cran-* packages I co-maintain until > > I > > get your answer. > > How are they usually modified? The format in which that happens is what > we need (together with the ability to do that within Debian).
Hi Joerg, While each of them is different, I think I can say that they are usually not modified. Their value is to stay the same for years, so that examples derived from them are reproductible. Here are a couple of examples from the core R package: The data give the speed of cars and the distances taken to stop. Note that the data were recorded in the 1920s. This data set provides information on the fate of passengers on the fatal maiden voyage of the ocean liner ‘Titanic’, summarized according to economic status (class), sex, age and survival. The ‘Indometh’ data frame has 66 rows and 3 columns of data on the pharmacokinetics of indomethicin. The (approximately) quarterly approval rating for the President of the United states from the first quarter of 1945 to the last quarter of 1974. Interestingly, the datasets shipped in the core R source package are not in binary format, but in R code format, for instance: cars <- data.frame( speed = c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20, 22, 23, 24, 24, 24, 24, 25), dist = c(2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80, 20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32, 48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)) "presidents" <- structure(c(NA, 87, 82, 75, 63, 50, 43, 32, 35, 60, 54, 55, 36, 39, NA, NA, 69, 57, 57, 51, 45, 37, 46, 39, 36, 24, 32, 23, 25, 32, NA, 32, 59, 74, 75, 60, 71, 61, 71, 57, 71, 68, 79, 73, 76, 71, 67, 75, 79, 62, 63, 57, 60, 49, 48, 52, 57, 62, 61, 66, 71, 62, 61, 57, 72, 83, 71, 78, 79, 71, 62, 74, 76, 64, 62, 57, 80, 73, 69, 69, 71, 64, 69, 62, 63, 46, 56, 44, 44, 52, 38, 46, 36, 49, 35, 44, 59, 65, 65, 56, 66, 53, 61, 52, 51, 48, 54, 49, 49, 61, NA, NA, 68, 44, 40, 27, 28, 25, 24, 24), .Tsp = c(1945, 1974.75, 4), class = "ts") The example above is interesting because there are missing values (NA). Dealing with missing value is a delicate issue in statistics, and correcting the above table to fill the missing value would make it lose its interest as an example of a time serie with missing values. The Rdata files are examples of real data, not scientific references meant to be corrected or extended. My opinion is therefore that the binary format offers the same freedoms as the R code format, or as a CSV table, an Excel table, an Openoffice table, etc. What the author used to produce the R objects is of little relevance as it is more a disposable intermediate than a source that should stay available for helping people to modify. Note that there is no evidence that all Rdata files come from R code as above. My wild guess is that many have been imported as a CSV table at some point. To be carricatural, I would say that the Rdata format is not less obscure as a .csv.gz format. Both need an command line to be transformed to csv format. I hope I have not been confusing. If you would like external opinion, I suggest to contact to our Debian expert Dirk Eddelbuettel (e...@debian.org). His work on and with R is reckognised internationally. Have a nice day and thanks for the fast answer, I really appreciate it. -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org