Dear Anthony – On closer examination, what I am talking about is not factor levels, but something different (but analogous). The data that is categorical all has integer codes, so the file is entirely numeric. The SAS proc format then gives text strings for each code for each categorical variable. Like this:
value REGION_f 11 = "New England Division" 12 = "Middle Atlantic Division" 21 = "East North Central Division" 22 = "West North Central Division" 31 = "South Atlantic Division" 32 = "East South Central Division" 33 = "West South Central Division" 41 = "Mountain Division" 42 = "Pacific Division" 97 = "State not identified" So it would make sense to have a lookup table of these codes linked to the variables. I’m not sure if it makes more sense to have that table live in R or in the database. For R purposes, I imagine it would make sense to convert these integer-valued variables into factors. What I do not understand is how SAS knows where the variables begin and end. I managed to break off a little hunk of the beginning of my file and look at it in an editor, and it is numbers without any obvious delimiters. Is the delimiter a particular numeric string? I thought the SAS command file would contain the starting location for each of the fixed-length fields, but I do not see anything in the file that could be interpreted that way – just a little wraparound code and then a long list of variable names followed by triplets of a code, an equals sign, and a text string, terminating with a semicolon. I’m sorry if I am being obtuse. When I said before that I had saved the SAS files as flat files, what I really meant was that I had an intern do it. When I was doing my own analysis, I mainly used TSP, before I switched to R about a year ago. I’ve never used SAS. I find your data project very interesting. Very. It is not actually necessary to wait for BLS to release the older CEX files, if you can lay your hands on the CDs. I spoke to the BLS data products office about 2 years ago, and they have no problem with people republishing purchased data in any format they like, including simple duplication. In fact, they seemed to like the idea. I think the sale of data was forced on them by some kind of mandate from above. I'll be playing with your code (which is a model of readability, and a lesson to me on same, BTW) and keep you posted on my progress. Warmly, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353p4649541.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.