I'm trying to read some mainframe data encoded as EBCDIC into R, and am at a loss. I'd like to avoid using an external program to convert the files, since I'm operating in a corporate environment.
You can find the example files at at the link below, with both ASCII and EBCDIC versions. Note that there are no linebreaks in the EBCDIC versions of the file -- instead, I'd be specifying the width of each line manually. R has the IBM500 encoding available in my environment, which should be the correct one for these files. However, when I run the following commands, R seems to fail entirely. It loads a single record with garbage characters, regardless of the encoding I specified. layout <- read.fwf("EBCDIC_LAYOUT", widths = c(80), fileEncoding='ibm500') data <- read.fwf("EBCDIC_ZIPCODE", widths = c(32), fileEncoding='ibm500') Where might I go from here? Related -- some of the files I expect to use will be fairly large (1 GB or so). Preferably, I'd like a solution that scales reasonably well. (I tried packages like LaF, but they don't have the option to select encoding.) Thank you very much! Example files -- https://drive.google.com/open?id=0ByvX1v-WqaaASTdwV2ZYS0pBV00&authuser=0 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.