On Mon, 5 Nov 2018 08:36:13 -0500 (EST) Sebastien Bihorel <sebastien.biho...@cognigencorp.com> wrote:
> [1] "râs" Interesting. This is what I get if I decode the bytes 72 e2 80 99 73 0a as latin-1 instead of UTF-8. They look like there is only three characters, but, actually, there is more: $ perl -CSD -Mcharnames=:full -MEncode=decode \ -E'for (split //, decode latin1 => pack "H*", "72e28099730a") { say ord, " ", $_, " ", charnames::viacode(ord) }' 114 r LATIN SMALL LETTER R 226 â LATIN SMALL LETTER A WITH CIRCUMFLEX 128 PADDING CHARACTER 153 SINGLE GRAPHIC CHARACTER INTRODUCER 115 s LATIN SMALL LETTER S 10 LINE FEED Does it help if you explicitly specify the file encoding by passing fileEncoding="UTF-8" argument to scan()? -- Best regards, Ivan ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.