Re: [R] Reading newlines with read.table?

Allan Engelhardt Fri, 04 Jun 2010 09:08:53 -0700

I ended up pre-processing the files outside of R using a script alongthe lines of


#!/bin/bash
for f in *_table_extract_*.txt; do
    echo -n "Processing $f..."
    o="${f}.xz"
    iconv -f "UTF-16LE" -t "UTF-8" $f | \
        tail -c +4 | \
        perl -l012 -015 -pe 's/\n//g' | \

perl -ne 'print if (!m{\A \( \d+ \s row\(s\) \s affected \) \s*\z}ixms && !m{\A \s* \z}xms)' | \

        xz -7 > $o
    echo "done."
done

Ugly, but it worked for me. You can change the first perl regularexpression to do different things with line terminating \n versusin-field \n characters but I just dropped them all. The tail commanddrops the byte-order-mark (which we do not need for utf-8) and thesecond perl command drops blanks and a stupid SQL tool output.

Thanks to Prof. Brian Ripley who, essentially, pointed out that withembedded linefeed characters my file was a binary file and not really atext file. Her Majesty's government respectfully begs to disagree [1]but that's the R definition so we'll use it on this list.


Allan

[1] Original data sets described athttp://www.hm-treasury.gov.uk/psr_coins_data.htm and downloaded fromhttp://data.gov.uk/dataset/coins (hint: you'll need p7zip to unpack themon a Linux box).



On 04/06/10 14:49, Allan Engelhardt wrote:

I have a text file that is UTF-16LE encoded with CRLF line endings and'@' as field separators that I want to read in R on a Linux system.Which would be fine as
read.table("foo.txt", file.encoding = "UTF-16LE", sep = "@", ...)
*except* that the data may contain the LF character which R treats asend-of-line and then barfs that there are too few elements on that line.
Any suggestions for how to process this one efficiently in R?  [...]


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading newlines with read.table?

Reply via email to