Hi Andrew, great to hear from you  :)

You really ought to review the (100% R-specific) US Government Survey
Datasets already available at http://usgsd.blogspot.com/ and contact me
directly if you hit a problem -- I am furiously working on a few right now
(ACS, SIPP, BSAPUFs, BRFSS, MEPS) , and am open to focusing on others if
there's good reason.


1.   Anthony, does the read.SAScii.sqlite function  preserve the label names
> for factors in a data frame it imports into SQLite, when those labels are
> coded in the command file?
>

SAScii doesn't touch labels.  I haven't really worried about them, but I'd
consider your suggestions about how to incorporate them into the package.



> 2.   If I want to make the resulting SQLite database available to the R
> community, is there a good place for me to put it? Assume it is 10-20 gigs
> in size.  Ideally, it would be set up so that it could be queried remotely
> and extracts downloaded. Setting this up is beyond my competence today, but
> maybe not in a couple of months.



I don't recommend this..  it probably violates IPUMS terms of usage.
Besides, it's not very hard for individuals to get IPUMS data into R.  See
the ?read.SAScii() example at the bottom of page 12 of
http://cran.r-project.org/web/packages/SAScii/SAScii.pdf.

That said, I personally wouldn't use SAScii for IPUMS data myself.  IPUMS
recently started allowing extracts to be downloaded as csvs, which means
analysts have many more options than just read.SAScii() for smaller data
sets and read.SAScii.sqlite() for larger ones.  (read.csv() and the sqldf
package's read.csv.sql() for example).

SAScii is just a giant workaround big enough for its own R package.
read.csv() and read.csv.sql() are much more developed, de-bugged, and
widely-used.

If your computer has enough RAM to get the IPUMS file (with replicate
weights - which generally double the filesize), skip SQLite altogether.
The example CPS code on usgsd uses SQLite so it works on computers down to
4GB of RAM, but it should be easy for you to alter that code and skip the
database components altogether.  Storing each year of CPS data as an R data
file (.rda) will speed everything up.


(I'd like to do the same thing with the 30
> years of Consumer Expenditure Survey data I have. I don't have access to
> SAS
> any more, but I converted it all to flat flies while I still did. Currently
> the BLS only makes 2011 microdata available free. Earlier years on cd are
> $200/year. But they have told me that they have no objection to my making
> them available).
>

You might want to browse around the rest of
http://usgsd.blogspot.com/before re-doing stuff, I have already done
that for 2011.  ;)

Getting the Consumer Expenditure Survey working properly in R was pretty
challenging.  But it's done now, with boatloads of detailed comments, and
nobody ever has to do it again..

http://usgsd.blogspot.com/search/label/consumer%20expenditure%20survey%20%28ce%29


BLS is slowly releasing the public use microdata on their own, so I am
waiting till they get back to 1996.  That way, everything is reproducible
-- everyone starts from the same BLS files, matches the same BLS
publications (to confirm the methodology is sound), and starts their own
analyses from the same complex sample survey design object.

They talk about their public release schedule on
http://www.bls.gov/cex/pumdhome.htm#online.  If you can't wait for them to
release it, you'll still probably want to link to the CE code I've
written.  Creating that survey object was tough stuff.




> 3. I have not yet been able to determine whether CPS micro data from the
> period 1940-1961 exists. Does anyone know? It is not on
> http://thedataweb.rm.census.gov/ftp/cps_ftp.html, and  IPUMS and NBER
> (http://www.nber.org/data/current-population-survey-data.html)  both only
> give data back to 1962. I wrote to Census a week ago, but I have not heard
> back from them, and in the past they have not been very helpful about
> historical micro data.
>

idk sorry.  But census is generally very responsive -- ping 'em in another
week.  :)


Good luck and keep in touch!

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to