Working on Windows I have had to deal with CSV files that, unfortunately, contain embedded Control-Zs, i.e. ASCII character 26 in decimal, and the readLines() function in R on Windows (2.15.2 and 3.0.0) appears to truncate at the control-Z. There is no problem at all on Ubuntu Linux with R 3.0.0.
Am I mistaken or is this genuine? # Create a small file with embedded Control-Z h3 <- paste('1,34,44.4,"', rawToChar(as.raw(c(65, 26, 65))), '",99') h3 # "1,34,44.4,\" A\032A \",99" writeLines(h3, 'h3.txt') # now attempt to read the file back in h3a <- readLines('h3.txt') # but on Windows 2.15.2 and 3.0.0 I get the message #Warning message: #In readLines("h3.txt") : incomplete final line found on 'h3.txt' h3a # [1] "1,34,44.4,\" A" # so it drops from the Control-Z onwards #### # The following is my rough and ready workaround - I'm sure there is a cleaner way fnam <- 'h3.txt' tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*file.info(fnam)$size, 100)) tmp.char <- rawToChar(tmp.bin) txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE)) txt # [1] "1,34,44.4,\" A\032A \",99" This was on 64-bit R on a 64-bit Windows 7, but it also appears to be the case in a 32-bit R 2.15.2 on 32-bit Windows-7 inside in a VirtualBox. Kind regards, Sean O'Riordain Trinity College Dublin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel