Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomáš Bořil
For me, this would be a perfect solution. I.e., do not use the “best” fit and leave it to user’s competence: a) in some functions, utf-8 works b) in others -> error is thrown (e.g., incomplete string, NA, etc.) => user has to change the code with his/her intentional “best fit string literal substi

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomas Kalibera
On 4/10/19 6:32 PM, Jeroen Ooms wrote: On Wed, Apr 10, 2019 at 5:45 PM Duncan Murdoch wrote: On 10/04/2019 10:29 a.m., Yihui Xie wrote: Since it is "technically easy" to disable the best fit conversion and the best fit is rarely good, how about providing an option for code/package authors to d

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomas Kalibera
On 4/10/19 6:13 PM, Tomáš Bořil wrote: An optional parameter to source() function which would translate all UTF-8 characters in string literals to their "\U" codes sounds as a great idea (and I hope it would fix 99.9% of problems I have - because that is the way I overcome these problems now

Re: [Rd] Parsing code with newlines

2019-04-10 Thread Mikhail Titov
On Wed, Apr 10, 2019 at 5:06 AM, Tomas Kalibera wrote: >> This is my first post here. I came across the very same problem. >> It can be reproduced within modified tests/Embedding/RParseEval.c > > Please check https://www.r-project.org/posting-guide.html and update > your post if you still need t

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Duncan Murdoch
On 10/04/2019 12:32 p.m., Jeroen Ooms wrote: On Wed, Apr 10, 2019 at 5:45 PM Duncan Murdoch wrote: On 10/04/2019 10:29 a.m., Yihui Xie wrote: Since it is "technically easy" to disable the best fit conversion and the best fit is rarely good, how about providing an option for code/package autho

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Jeroen Ooms
On Wed, Apr 10, 2019 at 5:45 PM Duncan Murdoch wrote: > > On 10/04/2019 10:29 a.m., Yihui Xie wrote: > > Since it is "technically easy" to disable the best fit conversion and > > the best fit is rarely good, how about providing an option for > > code/package authors to disable it? I'm asking becau

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomáš Bořil
Yes, again in a script sourced by source(encoding = ...). But also by typing it directly in R console. Most of the time, I use RStudio as a front-end. For this experiment, I also verified it in Rgui. In both front-ends, it behaves completely in the same way. An optional parameter to source() func

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Duncan Murdoch
On 10/04/2019 10:29 a.m., Yihui Xie wrote: Since it is "technically easy" to disable the best fit conversion and the best fit is rarely good, how about providing an option for code/package authors to disable it? I'm asking because this is one of the most painful issues in packages that may need t

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Yihui Xie
Since it is "technically easy" to disable the best fit conversion and the best fit is rarely good, how about providing an option for code/package authors to disable it? I'm asking because this is one of the most painful issues in packages that may need to source() code containing UTF-8 characters t

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomas Kalibera
On 4/10/19 1:14 PM, Jeroen Ooms wrote: On Wed, Apr 10, 2019 at 12:19 PM Tomáš Bořil wrote: Minimalistic example: Let's type "ř" (LATIN SMALL LETTER R WITH CARON) in RGui console: "ř" [1] "r" Although the script is in UTF-8, the characters are replaced by "simplified" substitutes uncontrollab

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Jeroen Ooms
On Wed, Apr 10, 2019 at 12:19 PM Tomáš Bořil wrote: > > Minimalistic example: > Let's type "ř" (LATIN SMALL LETTER R WITH CARON) in RGui console: > > "ř" > [1] "r" > > Although the script is in UTF-8, the characters are replaced by > "simplified" substitutes uncontrollably (depending on OS locale)

Re: [Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomas Kalibera
On 4/10/19 10:22 AM, Tomáš Bořil wrote: > Hello, > > There is a long-lasting problem with processing UTF-8 source code in R > on Windows OS. As Windows do not have "UTF-8" locale and R passes > source code through OS before executing it, some characters are > "simplified" by the OS before processin

[Rd] R 3.5.3 and 3.6.0 alpha Windows bug: UTF-8 characters in code are simplified to wrong ones

2019-04-10 Thread Tomáš Bořil
Hello, There is a long-lasting problem with processing UTF-8 source code in R on Windows OS. As Windows do not have "UTF-8" locale and R passes source code through OS before executing it, some characters are "simplified" by the OS before processing, leading to undesirable changes. Minimalistic ex

Re: [Rd] Parsing code with newlines

2019-04-10 Thread Tomas Kalibera
On 4/5/19 8:14 AM, Mikhail Titov wrote: Hello! This is my first post here. I came across the very same problem. It can be reproduced within modified tests/Embedding/RParseEval.c Please check https://www.r-project.org/posting-guide.html and update your post if you still need to get help here -