On Sun, Oct 25, 2009 at 5:48 AM, Yihui Xie <xieyi...@gmail.com> wrote: > Hi everyone, > > I wonder if there already exists any R packages containing all the > data sets for the book "The Statistical Sleuth" > (http://www.proaxis.com/~panorama/home.htm; also available at StatLib > http://lib.stat.cmu.edu/datasets/sleuth). > > I'm writing an R package with a friend for one of our stat courses > where SAS is the main tool being used. As the time is limited and half > of the semester has gone, we want to finish the package ASAP before > the biased (my personal feeling) impression towards R comes up. It > will save us some time (especially the time on writing R > documentation) if anyone has already done the work of packing up all > the data sets. Thanks a lot!
You should be able to read the spss versions of the data files using 'read.spss' from the "foreign" package. I've just read in all the .sav files from the 2nd edition data sets with no errors. Probably all you then need to do is convert them to data frames and save them as a .RData file which your students can "attach". Actually it's turning out quicker for me to do this than to tell you how :) Get the spss.exe, unzip it to create a load of .sav files, install the 'foreign' package if you don't have it already, then do this in R: require(foreign) e=new.env() for(f in list.files(pattern=".sav")){ name = sub(".sav","",f) data = as.data.frame(read.spss(f)) assign(name,data,env=e) } save(file="statsleuth.RData",list=ls(e),envir=e) Then to test start a new R session and do: > attach("statsleuth.RData") > summary(ex1611) COUNTRY PCTCATH P2PRATIO PCTINDIG Argentina : 1 Min. : 1.20 Min. : 0.9 Min. : 13.00 Australia : 1 1st Qu.:28.60 1st Qu.: 1.8 1st Qu.: 58.50 Bolivia : 1 Median :82.10 Median : 3.8 Median : 76.00 Brazil : 1 Mean :63.74 Mean : 5.1 Mean : 70.53 Chile : 1 3rd Qu.:95.50 3rd Qu.: 8.3 3rd Qu.: 92.00 Ecuador : 1 Max. :97.60 Max. :11.9 Max. :100.00 (Other) :15 NA's : 2.00 > ls("file:statsleuth.RData") [1] "case0101" "case0102" "case0201" "case0202" "case0301" "case0302" [7] "case0401" "case0402" "case0501" "case0502" "case0601" "case0602" [13] "case0701" "case0702" "case0801" "case0802" "case0901" "case0902" [etc etc etc etc] My only worry is whether all the data sets convert to data frames okay, and nothing is lost in the conversion. It's possible that SPSS has all sorts of other metadata that is dropped, or something. I'd suggest you check all 140 data sets first... Barry ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.