andrewH skreiv: > Is there a data frame analog to sparse matrices? I am working with a panel > data set that has a large number of variables that are redefined > repeatedly or exist for only a few years (out of 48). In my current > structure, where variables are columns and rows are years, more than 90 > percent of the cells and more than 3/4 of the total size of my file are > NAs. > > I am wondering if there is an alternate file specification currently > available that still allows numeric, character and factor data to be > stored. Besides just using a database.
How about storing the data in a ‘long’ format, like you get when you apply melt() (with na.rm=TRUE) from the ‘reshape2’ package to your data frame? Parts of the data frame (the ID part) will be repeated on each row, which may make the data take up more space, but no rows are stored for NA cells, so for somewhat sparse data it will be a win. It also makes it very easy to reshape and analyse the data. Here’s an introduction (to the older ‘reshape’ package, but ‘reshape2’ is very similar): http://www.jstatsoft.org/v21/i12 You might also be interested in this paper on ‘tidy’ data: http://vita.had.co.nz/papers/tidy-data.pdf -- Karl Ove Hufthammer E-mail: k...@huftis.org Jabber: huf...@jabber.no ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.