I think it is as Kevin described in an earlier response - the garbled
output is because a UTF-8 encoded string is assumed to be native
encoding (which happens not to be UTF-8 on the platform where this is
observed) and converted again to UTF-8.
I think the documentation is consistent with th
Of course, right after writing this e-mail I tested on my Windows
machine and did not see what I expected:
> charToRaw(before)
[1] c3 a9
> charToRaw(after)
[1] e9
so obviously I'm misunderstanding something as well.
Best,
Kevin
On Sat, Feb 17, 2018 at 2:19 PM, Kevin Ushey wrote:
> From my unde
>From my understanding, translation is implied in this line of ?file (from the
Encoding section):
The encoding of the input/output stream of a connection can be specified
by name in the same way as it would be given to iconv: see that help page
for how to find out what encoding names a
On Thu, Feb 15, 2018 at 11:19 AM, Kevin Ushey wrote:
> I suspect your UTF-8 string is being stripped of its encoding before
> write, and so assumed to be in the system native encoding, and then
> re-encoded as UTF-8 when written to the file. You can see something
> similar with:
>
> > tmp <- '
I suspect your UTF-8 string is being stripped of its encoding before
write, and so assumed to be in the system native encoding, and then
re-encoded as UTF-8 when written to the file. You can see something
similar with:
> tmp <- 'é'
> tmp <- iconv(tmp, to = 'UTF-8')
> Encoding(tmp) <- "
I think this behavior is inconsistent with the documentation:
tmp <- 'é'
tmp <- iconv(tmp, to = 'UTF-8')
print(Encoding(tmp))
print(charToRaw(tmp))
tmpfilepath <- tempfile()
writeLines(tmp, con = file(tmpfilepath, encoding = 'UTF-8'), useBytes = TRUE)
[1] "UTF-8"
[1] c3 a9
Raw text a