Re: [Rd] How to get utf8 string using R externals

2021-06-03 Thread xiaoyan yu
Thanks! I tried my C++ program based on R externals and the same R script
and found the results shown are the desired glyph.
Hence this is R windows specific problem.


On Wed, Jun 2, 2021 at 9:08 PM brodie gaslam 
wrote:

>
> > On Wednesday, June 2, 2021, 7:58:54 PM EDT, xiaoyan yu <
> xiaoyan...@gmail.com> wrote:
> >
> > I am using gmail. Not sure of the configuration of plain text.
> > The memory pointed by the char * as the output of Rf_translateChar() is
> > actually the string "".
>
> Hi Xiaoyan,
>
> Unfortunately I'm not super familiar with R on Windows, but I think
> I can provide a simpler reproducible example.  In Rgui, if I type "\UBD80"
> at the prompt and hit enter, I see the desired glyph.  In Rterm I see the
> unicode escape.
>
> IIRC the capabilities of Rterm and Rgui are different, and UTF8 support
> in windows is limited.  Tomas Kalibera discusses this in some detail:
>
>
> https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html
>
> In terms of `Rf_translateChar()`, presumably the `Riconv` call is failing
> on Rterm, but not on Rgui:
>
> https://github.com/r-devel/r-svn/blob/master/src/main/sysutils.c#L924
>
> I'm guessing, but that would explain why the C level string is in that
> format.  I don't know why the string would translate in Rgui though.  My
> guess is that it did not as even in Rgui the following:
>
> enc2native("\uBD80")
>
> Produces the escaped version of the string.
>
> As others have suggested you could try the experimental UCRT Windows
> release:
>
>
> https://developer.r-project.org/Blog/public/2021/03/12/windows/utf-8-toolchain-and-cran-package-checks/index.html
>
> Install instructions (focus on Binary installer):
>
>
> https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/howto.html
>
> If I try UCRT on my system this no longer produces the escape:
>
> enc2native("\uBD80")
>
> Although all I see is a question mark.  My guess is that my code page or
> something similar is not set right.  Examining with `charToRaw` reveals
> the string remains in UTF-8 encoding.
>
> Aside: it's not clear to me that you need to translate the string if your
> intent is for it to remain UTF-8.  You just don't seem to be set-up to
> interpret UTF-8 strings currently.
>
> Best,
>
> B
>
> > On Wed, Jun 2, 2021 at 6:09 PM David Winsemius 
> > wrote:
> >
> >> First; you should configure yopu mail client to send plain text.
> >>
> >> Can you explain what is meant by:
> >>
> >> the characters are unicodes () instead of
> >> utf8 encoding of the korean characters 부실.
> >>
> >> As far as I can tell those two unicodes _are_ the utf8 encodings of 부실.
> >>
> >> You may need to consult a couple of R help pages. I suggest:
> >>
> >> ?Quotes
> >> ?points  # has examples of changing fonts used for display on console.
> >>
> >> Sorry if I've misunderstood. I'm not on a Windows device, so  posting
> the
> >> C++ program won't be helpful, but maybe it would for other prospective
> >> respondents.
> >>
> >> --
> >> David.
> >>
> >> On 6/2/21 1:33 PM, xiaoyan yu wrote:
> >> > I have a R Script Predict.R:
> >> >  set.seed(42)
> >> >  C <- seq(1:1000)
> >> >  A <- rep(seq(1:200),5)
> >> >  E <- (seq(1:1000) * (0.8 + (0.4*runif(50, 0, 1
> >> >  L <- ifelse(runif(1000)>.5,1,0)
> >> >  df <- data.frame(cbind(C, A, E, L))
> >> > load("C:/Temp/tree.RData")#  load the model for
> scoring
> >> >
> >> >P <- as.character(predict(tree_model_1,df,type='class'))
> >> >
> >> > Then in a C++ program
> >> > I call eval to evaluate the script and then findVar the P variable.
> >> > After get each class label from P using string_elt and then
> >> > Rf_translateChar, the characters are unicodes ()
> instead
> >> of
> >> > utf8 encoding of the korean characters 부실.
> >> > Can I know how to get UTF8 by using R externals?
> >> >
> >> > I also found the same script giving utf8 characters in RGui but
> unicode
> >> in
> >> > Rterm.
> >> > I tried to attach a screenshot but got message "The message's content
> >> type
> >> > was not explicitly allowed"
> >> > In RGui, I saw the output 부실, while in Rterm, .
> >> >
> >> > Please help.
> >> >
> >> >  [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-devel@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-06-03 Thread Greg Warnes
I would be glad to add this to one of my R packages, probably `gdata`..

-G

Gregory R. Warnes, Ph.D.
g...@warnes.net
Eternity is a long time, take a friend!


> On May 26, 2021, at 1:09 PM, Adrian Dușa  wrote:
> 
> Yes, that is even better.
> Best,
> Adrian
> 
> On Wed, May 26, 2021 at 7:05 PM Duncan Murdoch  >
> wrote:
> 
>> After 5 minutes more thought:
>> 
>> - code non-missing as missingKind = NA, not 0, so that missingKind could
>> be a character vector, or missingKind = 0 could be supported.
>> 
>> - print methods should return the main argument, so mine should be
>> 
>> print.MultiMissing <- function(x, ...) {
>>   vals <- as.character(x)
>>   if (!is.character(x) || inherits(x, "noquote"))
>> print(noquote(vals))
>>   else
>> print(vals)
>>   invisible(x)
>> }
>> 
>> This still needs a lot of improvement to be a good print method, but
>> I'll leave that to you.
>> 
>> Duncan Murdoch
>> 
>> On 26/05/2021 11:43 a.m., Duncan Murdoch wrote:
>>> On 26/05/2021 10:22 a.m., Adrian Dușa wrote:
 Dear Duncan,
 
 On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch <
>> murdoch.dun...@gmail.com
 > wrote:
 
 You've already been told how to solve this:  just add attributes
>> to the
 objects. Use the standard NA to indicate that there is some kind of
 missingness, and the attribute to describe exactly what it is.
>> Stick a
 class on those objects and define methods so that subsetting and
 arithmetic preserves the extra info you've added. If you do some
 operation that turns those NAs into NaNs, big deal:  the attribute
>> will
 still be there, and is.na (NaN) still returns TRUE.
 
 
 I've already tried the attributes way, it is not so easy.
>>> 
>>> If you have specific operations that are needed but that you can't get
>>> to work, post the issue here.
>>> 
 In the best case scenario, it unnecessarily triples the size of the
 data, but perhaps this is the only way forward.
>>> 
>>> I don't see how it could triple the size.  Surely an integer has enough
>>> values to cover all possible kinds of missingness.  So on integer or
>>> factor data you'd double the size, on real or character data you'd
>>> increase it by 50%.  (This is assuming you're on a 64 bit platform with
>>> 32 bit integers and 64 bit reals and pointers.)
>>> 
>>> Here's a tiny implementation to show what I'm talking about:
>>> 
>>> asMultiMissing <- function(x) {
>>>if (isMultiMissing(x))
>>>  return(x)
>>>missingKind <- ifelse(is.na(x), 1, 0)
>>>structure(x,
>>>  missingKind = missingKind,
>>>  class = c("MultiMissing", class(x)))
>>> }
>>> 
>>> isMultiMissing <- function(x)
>>>inherits(x, "MultiMissing")
>>> 
>>> missingKind <- function(x) {
>>>if (isMultiMissing(x))
>>>  attr(x, "missingKind")
>>>else
>>>  ifelse(is.na(x), 1, 0)
>>> }
>>> 
>>> `missingKind<-` <- function(x, value) {
>>>class(x) <- setdiff(class(x), "MultiMissing")
>>>x[value != 0] <- NA
>>>x <- asMultiMissing(x)
>>>attr(x, "missingKind") <- value
>>>x
>>> }
>>> 
>>> `[.MultiMissing` <- function(x, i, ...) {
>>>missings <- missingKind(x)
>>>x <- NextMethod()
>>>missings <- missings[i]
>>>missingKind(x) <- missings
>>>x
>>> }
>>> 
>>> print.MultiMissing <- function(x, ...) {
>>>vals <- as.character(x)
>>>if (!is.character(x) || inherits(x, "noquote"))
>>>  print(noquote(vals))
>>>else
>>>  print(vals)
>>> }
>>> 
>>> `[<-.MultiMissing` <- function(x, i, value, ...) {
>>>missings <- missingKind(x)
>>>class(x) <- setdiff(class(x), "MultiMissing")
>>>x[i] <- value
>>>missings[i] <- missingKind(value)
>>>missingKind(x) <- missings
>>>x
>>> }
>>> 
>>> as.character.MultiMissing <- function(x, ...) {
>>>missings <- missingKind(x)
>>>result <- NextMethod()
>>>ifelse(missings != 0,
>>>   paste0("NA.", missings), result)
>>> 
>>> }
>>> 
>>> This is incomplete.  It doesn't do printing very well, and it doesn't
>>> handle the case of assigning a MultiMissing value to a regular vector at
>>> all.  (I think you'd need an S4 implementation if you want to support
>>> that.)  But it does the basics:
>>> 
 x <- 1:10
 missingKind(x)[4] <- 23
 x
>>>   [1] 1 2 3 NA.23 5 6 7 8 9
>>> [10] 10
 is.na(x)
>>>   [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
>>> [10] FALSE
 missingKind(x)
>>>   [1]  0  0  0 23  0  0  0  0  0  0
 
>>> 
>>> Duncan Murdoch
>>> 
 
 Base R doesn't need anything else.
 
 You complained that users shouldn't need to know about attributes,
>> and
 they won't:  you, as the author of the package that does this, will
 handle all those details.  Working in your subject area you know
>> all
 the
 different kinds of NAs that people car