On Thu, Feb 5, 2015 at 2:08 PM, Brian Trautman <btrautma...@gmail.com> wrote:
> I'm trying to read some mainframe data encoded as EBCDIC into R, and am at > a loss. I'd like to avoid using an external program to convert the files, > since I'm operating in a corporate environment. > > You can find the example files at at the link below, with both ASCII and > EBCDIC versions. Note that there are no linebreaks in the EBCDIC versions > of the file -- instead, I'd be specifying the width of each line manually. > R has the IBM500 encoding available in my environment, which should be the > correct one for these files. > > However, when I run the following commands, R seems to fail entirely. It > loads a single record with garbage characters, regardless of the encoding I > specified. > > > layout <- read.fwf("EBCDIC_LAYOUT", widths = c(80), fileEncoding='ibm500') > > data <- read.fwf("EBCDIC_ZIPCODE", widths = c(32), fileEncoding='ibm500') > > > Where might I go from here? > > Related -- some of the files I expect to use will be fairly large (1 GB or > so). Preferably, I'd like a solution that scales reasonably well. (I tried > packages like LaF, but they don't have the option to select encoding.) > > Thank you very much! > > > Example files -- > https://drive.google.com/open?id=0ByvX1v-WqaaASTdwV2ZYS0pBV00&authuser=0 > > I gave this a short try. What killed me (see below) is that your file EBCDIC_ZIPCODE has embedded NULL characters, \0. My transcript: > file<-file("EBCDIC_ZIPCODE",encoding="IBM500", raw=TRUE); > data=read.fwf(file,widths=c(32)); Warning messages: 1: In readLines(file, n = thisblock) : line 1 appears to contain an embedded nul 2: In readLines(file, n = thisblock) : incomplete final line found on 'EBCDIC_ZIPCODE' > View(data) I don't know how to get past the embedded NULL. I'm a UNIX user, so my thought (not applicable with your restriction of "pure R"), would be to use "tr" to convert the \0 to spaces, then use the above. -- He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.