Hi - I'm reading in a tab delimited file that is causing issues with read.delim. Specifically, for a specific set of lines, the last entry of the line is misread and considered to be the first entry of a new row (which is then padded with 'NA's' ). Specifically:
tmp <- read.delim( "trouble.txt", header=F ) produces a data.frame, tmp where if I call tmp[,1], I get output like: [76] F45H7.4#2 C47C12.5#2 F40H7.4#2 ZK353.2 0.59 [81] Y116A8C.34 0.23 Y116F11A.MM 0.04 F26D12.A I initially assumed it was a formatting issue with the file. However, I've tried looking at the file in octal viewer, and the lines in question seem fine. Additionally, using scan and then strsplit can split the lines correctly (code below the sig). Since I can't attach the file to a group posting, I can't give a sample of the lines causing the issue, however, I can send a small sample to anyone who's interested. Note, I've tried this on several architectures and versions of R and get the same behavior. Specifically, v.2.5.1 on an x86_64, as well as v.2.6.0 on an x686 architecture. I also get similar behavior when I convert the file into a comma-separated file and use read.csv. As a quick workaround I can use scan & strsplit, but thought someone might want to take a look at this problem. Thanks, Peter Waltman p.s. the combination of scan & strsplit I describe above was as follows: my.lines <- scan( "trouble.txt", sep="\n", what='character' ) split.lines <- strsplit( my.lines, "\t" ) num.entries <- sapply( split.lines, length ) after which num.lines will contain a equal number of entries as my.lines, all containing 509 (the number of elt's per line). ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.