On Fri, Sep 13, 2019 at 11:53 AM Tomas Kalibera <tomas.kalib...@gmail.com> wrote:
> On 9/13/19 11:37 AM, IAGO GINÉ VÁZQUEZ wrote: > > But if I type > > >"會" > > the output is > > [1] "會" > > so seemingly it can be represented. Or, am I wrong? > > In RGui you can print the string, because RGui is a Windows Unicode > application (uses UTF16-LE and bypasses the C runtime for strings). But > it is just the gui, R itself (and hence also packages) use the current > native encoding as defined by the C runtime. RGui will make sure R gets > the string in UTF-8, but as soon as you do anything even slightly > non-trivial, which includes formatting, the string will be converted to > the current native encoding. Some R functions allow you to do certain > things in UTF-8 without conversion to native encoding, you'd have to > read very carefully the documentation for each function - but for > practical use, you either need to live with the misinterpretation of > some characters, or use Windows in the locale where your characters can > be represented (e.g. Chinese locale when working with Chinese strings), > or use Linux/maOS. On Linux/macOS the current native encoding can be > UTF-8, so there is no problem. On Windows, with the current toolchain > based on mingw, this is not possible. > mingw-w64 is capable of processing utf-8 (it can process bytes after all). Can you explain what you mean here? Would any other compiler on Windows not suffer from this problem? > > > Best > Tomas > > > > > Best > > Iago > > ------------------------------------------------------------------------ > > *De:* Tomas Kalibera <tomas.kalib...@gmail.com> > > *Enviat el:* divendres, 13 de setembre de 2019 11:24 > > *Per a:* IAGO GINÉ VÁZQUEZ <i.g...@pssjd.org>; r-devel@r-project.org > > <r-devel@r-project.org> > > *Tema:* Re: [Rd] Printing chinese characters (UTF-8) on R 3.5.2 > > -windows 10 > > On 9/13/19 11:01 AM, IAGO GINÉ VÁZQUEZ wrote: > > > I have a chinese character on a data frame, but the output of > > printing it is its UTF-8 code. Concretely, the character is 會 and the > > code is U+6703. Following the code I arrive to the instruction > > > > > >> base::format.default("會") > > > which prints > > > > > > [1] "<U+6703>" > > > > > > I do not know which is the extent of this behaviour either if it > > follows on most recent versions of R. > > > > > > Is it expected? > > > > If you are running this on Windows in an encoding where the character > > cannot be represented (e.g. non-Chinese locale), then yes, this is > > expected behavior. > > > > On Unix systems where R can run in UTF-8 encoding (Linux, macOS), the > > character will be formatted/displayed properly. > > > > Best > > Tomas > > > > > > > > Thank you! > > > > > > Iago > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel