Re: [Rd] writeLines argument useBytes = TRUE still making conversions

2018-02-19 Thread Tomas Kalibera
I think it is as Kevin described in an earlier response - the garbled output is because a UTF-8 encoded string is assumed to be native encoding (which happens not to be UTF-8 on the platform where this is observed) and converted again to UTF-8. I think the documentation is consistent with th

Re: [Rd] writeLines argument useBytes = TRUE still making conversions

2018-02-17 Thread Kevin Ushey
Of course, right after writing this e-mail I tested on my Windows machine and did not see what I expected: > charToRaw(before) [1] c3 a9 > charToRaw(after) [1] e9 so obviously I'm misunderstanding something as well. Best, Kevin On Sat, Feb 17, 2018 at 2:19 PM, Kevin Ushey wrote: > From my unde

Re: [Rd] writeLines argument useBytes = TRUE still making conversions

2018-02-17 Thread Kevin Ushey
>From my understanding, translation is implied in this line of ?file (from the Encoding section): The encoding of the input/output stream of a connection can be specified by name in the same way as it would be given to iconv: see that help page for how to find out what encoding names a

Re: [Rd] writeLines argument useBytes = TRUE still making conversions

2018-02-15 Thread Ista Zahn
On Thu, Feb 15, 2018 at 11:19 AM, Kevin Ushey wrote: > I suspect your UTF-8 string is being stripped of its encoding before > write, and so assumed to be in the system native encoding, and then > re-encoded as UTF-8 when written to the file. You can see something > similar with: > > > tmp <- '

Re: [Rd] writeLines argument useBytes = TRUE still making conversions

2018-02-15 Thread Kevin Ushey
I suspect your UTF-8 string is being stripped of its encoding before write, and so assumed to be in the system native encoding, and then re-encoded as UTF-8 when written to the file. You can see something similar with: > tmp <- 'é' > tmp <- iconv(tmp, to = 'UTF-8') > Encoding(tmp) <- "

[Rd] writeLines argument useBytes = TRUE still making conversions

2018-02-15 Thread Davor Josipovic
I think this behavior is inconsistent with the documentation: tmp <- 'é' tmp <- iconv(tmp, to = 'UTF-8') print(Encoding(tmp)) print(charToRaw(tmp)) tmpfilepath <- tempfile() writeLines(tmp, con = file(tmpfilepath, encoding = 'UTF-8'), useBytes = TRUE) [1] "UTF-8" [1] c3 a9 Raw text a