Re: [Rd] iterated lapply

2015-02-26 Thread Martin Maechler
> Michael Weylandt 
> on Wed, 25 Feb 2015 21:43:36 -0500 writes:

>> On Feb 25, 2015, at 5:35 PM, Benjamin Tyner
>>  wrote:
>> 
>> Actually, it depends on the number of cores:

> Under current semantics, yes. Each 'stream' of function
> calls is lazily capturing the last value of `i` on that
> core.

> Under Luke's proposed semantics (IIUC), the result would
> be the same (2,4,6,8) for both parallel and serial
> execution. This is what allows for 'drop-in' parallelism.

>>> fun1 <- function(c){function(i){c*i}} fun2 <-
>>> function(f) f(2) sapply(mclapply(1:4, fun1,
>>> mc.cores=1L), fun2)
>> [1] 8 8 8 8
>>> sapply(mclapply(1:4, fun1, mc.cores=2L), fun2)
>> [1] 6 8 6 8
>>> sapply(mclapply(1:4, fun1, mc.cores=4L), fun2)
>> [1] 2 4 6 8
>> 

Thank you, Michael and Benjamin.

I strongly agree with your statements and the very strong desirability of
these mclapply() calls to behave the same as lapply().

So indeed, something like Luke's proposed changes both for
lapply(), mclapply()  --- *and* the other *apply() versions in
the parallel packages where needed (??) --- are very desirable.

In my teaching, and in our CRAN package 'simsalapar' we
that useRs should organize computations such that using lapply
serially is used for preliminary testing and  mclapply() etc are
used for the heavy weight computations.

Best,
Martin Maechler

> >>> / On Feb 24, 2015, at 10:50 AM,  >>> > wrote:
> >> />/ 
> >> />/ The documentation is not specific enough on the indented semantics in
> >> />/ this situation to consider this a bug. The original R-level
> >> />/ implementation of lapply was
> >> />/ 
> >> />/lapply <- function(X, FUN, ...) {
> >> />/FUN <- match.fun(FUN)
> >> />/if (!is.list(X))
> >> />/X <- as.list(X)
> >> />/rval <- vector("list", length(X))
> >> />/for(i in seq(along = X))
> >> />/rval[i] <- list(FUN(X[[i]], ...))
> >> />/names(rval) <- names(X)   # keep `names' !
> >> />/return(rval)
> >> />/}
> >> />/ 
> >> />/ and the current internal implementation is consistent with this. With
> >> />/ a loop like this lazy evaluation and binding assignment interact in
> >> />/ this way; the force() function was introduced to help with this.
> >> />/ 
> >> />/ That said, the expression FUN(X[[i]], ...) could be replaced by
> >> />/ 
> >> />/local({
> >> />/i <- i
> >> />/list(FUN(X[[i]], ...)
> >> />/})
> >> />/ 
> >> />/ which would produce the more desirable result
> >> />/ 
> >> />/> sapply(test, function(myfn) myfn(2))
> >> />/[1] 2 4 6 8
> >> />/ 
> >> /
> >> Would the same semantics be applied to parallel::mclapply and friends?
> >> 
> >> sapply(lapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
> >> 
> >> # [1] 8 8 8 8
> >> 
> >> sapply(mclapply(1:4, function(c){function(i){c*i}}), function(f) f(2))
> >> 
> >> # [1] 6 8 6 8
> >> 
> >> I understand why they differ, but making mclapply easier for 'drop-in' 
> >> parallelism might be a good thing. 
> >> 
> >> Michael

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] iterated lapply

2015-02-26 Thread William Dunlap
Would introducing the new frame, with the call to local(), cause problems
when you use frame counting instead of <<- to modify variables outside the
scope of lapply's FUN, I think the frame counts may have to change.  E.g.,
here is code from actuar::simul() that might be affected:

x <- unlist(lapply(nodes[[i]], seq))
lapply(nodes[(i + 1):(nlevels - 1)],
   function(v) assign("x", rep.int(x, v), envir =
parent.frame(2)))
m[, i] <- x

(I think the parent.frame(2) might have to be changed to parent.frame(8)
for that to work.  Such code looks pretty ugly to me but seems to be rare.)

It also seems to cause problems with some built-in functions:
newlapply <- function (X, FUN, ...)
{
FUN <- match.fun(FUN)
if (!is.list(X))
X <- as.list(X)
rval <- vector("list", length(X))
for (i in seq(along = X)) {
rval[i] <- list(local({
i <- i
FUN(X[[i]], ...)
}))
}
names(rval) <- names(X)
return(rval)
}
newlapply(1:2,log)
#Error in FUN(X[[i]], ...) : non-numeric argument to mathematical function
newlapply(1:2,function(x)log(x))
#[[1]]
#[1] 0
#
#[[2]]
#[1] 0.6931472



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Feb 24, 2015 at 7:50 AM,  wrote:

> The documentation is not specific enough on the indented semantics in
> this situation to consider this a bug. The original R-level
> implementation of lapply was
>
> lapply <- function(X, FUN, ...) {
> FUN <- match.fun(FUN)
> if (!is.list(X))
> X <- as.list(X)
> rval <- vector("list", length(X))
> for(i in seq(along = X))
> rval[i] <- list(FUN(X[[i]], ...))
> names(rval) <- names(X)   # keep `names' !
> return(rval)
> }
>
> and the current internal implementation is consistent with this. With
> a loop like this lazy evaluation and binding assignment interact in
> this way; the force() function was introduced to help with this.
>
> That said, the expression FUN(X[[i]], ...) could be replaced by
>
> local({
> i <- i
> list(FUN(X[[i]], ...)
> })
>
> which would produce the more desirable result
>
> > sapply(test, function(myfn) myfn(2))
> [1] 2 4 6 8
>
> The C implementation could use this approach, or could rebuild the
> expression being evaluated at each call to get almost the same semantics.
> Both would add a little overhead. Some code optimization might reduce
> the overhead in some instances (e.g. if FUN is a BUILTIN), but it's
> not clear that would be worth while.
>
> Variants of this issue arise in a couple of places so it may be worth
> looking into.
>
> Best,
>
> luke
>
>
> On Tue, 24 Feb 2015, Radford Neal wrote:
>
>  From: Daniel Kaschek 
>>
>>> ... When I evaluate this list of functions by
>>> another lapply/sapply, I get an unexpected result: all values coincide.
>>> However, when I uncomment the print(), it works as expected. Is this a
>>> bug or a feature?
>>>
>>> conditions <- 1:4
>>> test <- lapply(conditions, function(mycondition){
>>>   #print(mycondition)
>>>   myfn <- function(i) mycondition*i
>>>   return(myfn)
>>> })
>>>
>>> sapply(test, function(myfn) myfn(2))
>>>
>>
>> From: Jeroen Ooms 
>>
>>> I think it is a bug. If we use substitute to inspect the promise, it
>>> appears the index number is always equal to its last value:
>>>
>>
>> From: Duncan Temple Lang 
>>
>>> Not a bug, but does surprise people. It is lazy evaluation.
>>>
>>
>>
>> I think it is indeed a bug.  The lapply code saves a bit of time by
>> reusing the same storage for the scalar index number every iteration.
>> This amounts to modifying the R code that was used for the previous
>> function call.  There's no justification for doing this in the
>> documentation for lapply.  It is certainly not desired behaviour,
>> except in so far as it allows a slight savings in time (which is
>> minor, given the time that the function call itself will take).
>>
>>   Radford Neal
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
>Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] iterated lapply

2015-02-26 Thread luke-tierney

Actually using local() might create some issues, though probably not
many. For the C implementation of lapply I would probably create a new
environment with a frame containing the binding for i and use that in
an eval call.  That wouldn't add another call frame, but it would
change the environment which could still bite something. I would want
to run any change like this over at least CRAN, maybe also BIOC, tests
to see if there are any issues before committing.

There are a few other places where the internal C code does calls to R
functions in a less that ideal way. apply() is also currently written
as a loop along the lines of the original lapply I showed. The
parallel constructs from snow all use lapply or apply, so any changes
there would be inherited; the mc functions are a bit more complicated
and may need a more careful look.

Overall it looks like we could use a new utility at both R and C level
for calling a function with already evaluated arguments and use this
in all relevant places (maybe called funcall or .Funcall or something
like that). I'll try look into this in the next few weeks.

Best,

luke

On Thu, 26 Feb 2015, William Dunlap wrote:


Would introducing the new frame, with the call to local(), cause problems
when you use frame counting instead of <<- to modify variables outside the
scope of lapply's FUN, I think the frame counts may have to change.  E.g.,
here is code from actuar::simul() that might be affected:
        x <- unlist(lapply(nodes[[i]], seq))
        lapply(nodes[(i + 1):(nlevels - 1)],
               function(v) assign("x", rep.int(x, v), envir =
parent.frame(2)))
        m[, i] <- x

(I think the parent.frame(2) might have to be changed to parent.frame(8) for
that to work.  Such code looks pretty ugly to me but seems to be rare.)

It also seems to cause problems with some built-in functions:
newlapply <- function (X, FUN, ...) 
{
    FUN <- match.fun(FUN)
    if (!is.list(X)) 
        X <- as.list(X)
    rval <- vector("list", length(X))
    for (i in seq(along = X)) {
        rval[i] <- list(local({
            i <- i
            FUN(X[[i]], ...)
        }))
    }
    names(rval) <- names(X)
    return(rval)
}
newlapply(1:2,log)
#Error in FUN(X[[i]], ...) : non-numeric argument to mathematical function
newlapply(1:2,function(x)log(x))
#[[1]]
#[1] 0
#
#[[2]]
#[1] 0.6931472



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Feb 24, 2015 at 7:50 AM,  wrote:
  The documentation is not specific enough on the indented
  semantics in
  this situation to consider this a bug. The original R-level
  implementation of lapply was

      lapply <- function(X, FUN, ...) {
          FUN <- match.fun(FUN)
          if (!is.list(X))
          X <- as.list(X)
          rval <- vector("list", length(X))
          for(i in seq(along = X))
          rval[i] <- list(FUN(X[[i]], ...))
          names(rval) <- names(X)           # keep `names' !
          return(rval)
      }

  and the current internal implementation is consistent with this.
  With
  a loop like this lazy evaluation and binding assignment interact
  in
  this way; the force() function was introduced to help with this.

  That said, the expression FUN(X[[i]], ...) could be replaced by

      local({
          i <- i
          list(FUN(X[[i]], ...)
      })

  which would produce the more desirable result

      > sapply(test, function(myfn) myfn(2))
      [1] 2 4 6 8

  The C implementation could use this approach, or could rebuild
  the
  expression being evaluated at each call to get almost the same
  semantics.
  Both would add a little overhead. Some code optimization might
  reduce
  the overhead in some instances (e.g. if FUN is a BUILTIN), but
  it's
  not clear that would be worth while.

  Variants of this issue arise in a couple of places so it may be
  worth
  looking into.

  Best,

  luke

  On Tue, 24 Feb 2015, Radford Neal wrote:

From: Daniel Kaschek

  ... When I evaluate this list of
  functions by
  another lapply/sapply, I get an
  unexpected result: all values coincide.
  However, when I uncomment the print(),
  it works as expected. Is this a
  bug or a feature?

  conditions <- 1:4
  test <- lapply(conditions,
  function(mycondition){
    #print(mycondition)
    myfn <- function(i) mycondition*i
    return(myfn)
  })

  sapply(test, function(myfn) myfn(2))


From: Jeroen Ooms 
  I think it is a bug. If we use
  substitute to inspect the promise, it
  appears the index number is always equal
  to it

[Rd] Native characterset is wrong for unicode builds for Windows

2015-02-26 Thread maill...@tlink.de


When I send some outlandish characters through enc2native (or format) in 
R 3.1.2 on Ubuntu trusty it works quite well:


> "®ØΔЊת"
[1] "®ØΔЊת"
> enc2native("®ØΔЊת")
[1] "®ØΔЊת"
> Encoding(enc2native("®ØΔЊת"))
[1] "UTF-8"

In Windows the result is different:

> "®ØΔЊת"
[1] "®ØΔЊת"
> enc2native("®ØΔЊת")
[1] "®Ø"
> Encoding(enc2native("®ØΔЊת"))
[1] "latin1"

And this is wrong. The native character set of a unicode application 
under Windows is *Unicode*. enc2native should do the same under Windows 
as it does on Ubuntu. Also the "unknown" encoding should be changed to 
mean the same as "UTF-8" exactly as it is on Linux.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Native characterset is wrong for unicode builds for Windows

2015-02-26 Thread Duncan Murdoch
On 26/02/2015 3:09 PM, maill...@tlink.de wrote:
> 
> When I send some outlandish characters through enc2native (or format) in 
> R 3.1.2 on Ubuntu trusty it works quite well:
> 
>  > "®ØΔЊת"
> [1] "®ØΔЊת"
>  > enc2native("®ØΔЊת")
> [1] "®ØΔЊת"
>  > Encoding(enc2native("®ØΔЊת"))
> [1] "UTF-8"
> 
> In Windows the result is different:
> 
>  > "®ØΔЊת"
> [1] "®ØΔЊת"
>  > enc2native("®ØΔЊת")
> [1] "®Ø"
>  > Encoding(enc2native("®ØΔЊת"))
> [1] "latin1"
> 
> And this is wrong. The native character set of a unicode application 
> under Windows is *Unicode*. enc2native should do the same under Windows 
> as it does on Ubuntu. Also the "unknown" encoding should be changed to 
> mean the same as "UTF-8" exactly as it is on Linux.

What is a "unicode application", and what makes you think R is one?  R
is being told by Windows that your native encoding is latin1.  Perhaps
Windows 8 supports UTF-8 as a native encoding (I've never used it), but
previous versions of Windows didn't.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Native characterset is wrong for unicode builds for Windows

2015-02-26 Thread Winston Chang
On Thu, Feb 26, 2015 at 2:09 PM, maill...@tlink.de 
wrote:

>
> When I send some outlandish characters through enc2native (or format) in R
> 3.1.2 on Ubuntu trusty it works quite well:
>
> > "®ØΔЊת"
> [1] "®ØΔЊת"
> > enc2native("®ØΔЊת")
> [1] "®ØΔЊת"
> > Encoding(enc2native("®ØΔЊת"))
> [1] "UTF-8"
>
> In Windows the result is different:
>
> > "®ØΔЊת"
> [1] "®ØΔЊת"
> > enc2native("®ØΔЊת")
> [1] "®Ø"
> > Encoding(enc2native("®ØΔЊת"))
> [1] "latin1"
>
> And this is wrong. The native character set of a unicode application under
> Windows is *Unicode*. enc2native should do the same under Windows as it
> does on Ubuntu. Also the "unknown" encoding should be changed to mean the
> same as "UTF-8" exactly as it is on Linux.
>

I think you're mixing up the term "character set" with the encoding for a
character set. Unicode is a character set. UTF-8 is one of many encodings
of Unicode.

UTF-8 may be the native character encoding in Ubuntu, but it's not the
native encoding in Windows. According to this, what counts as the native
encoding in Windows depends on the code page:
  http://stackoverflow.com/a/4649507

So you shouldn't expect enc2native to do the same thing on Linux and
Windows. If you really want UTF-8, you can use enc2utf8.

-Winston

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Native characterset is wrong for unicode builds for Windows

2015-02-26 Thread maill...@tlink.de

On 26/02/2015 3:09 PM, maill...@tlink.de wrote:

When I send some outlandish characters through enc2native (or format) in
R 3.1.2 on Ubuntu trusty it works quite well:

  > "®ØΔЊת"
[1] "®ØΔЊת"
  > enc2native("®ØΔЊת")
[1] "®ØΔЊת"
  > Encoding(enc2native("®ØΔЊת"))
[1] "UTF-8"

In Windows the result is different:

  > "®ØΔЊת"
[1] "®ØΔЊת"
  > enc2native("®ØΔЊת")
[1] "®Ø"
  > Encoding(enc2native("®ØΔЊת"))
[1] "latin1"

And this is wrong. The native character set of a unicode application
under Windows is *Unicode*. enc2native should do the same under Windows
as it does on Ubuntu. Also the "unknown" encoding should be changed to
mean the same as "UTF-8" exactly as it is on Linux.

What is a "unicode application", and what makes you think R is one?  R
is being told by Windows that your native encoding is latin1.  Perhaps
Windows 8 supports UTF-8 as a native encoding (I've never used it), but
previous versions of Windows didn't.

Duncan Murdoch

A unicode application is a program that uses the unicode API of Windows 
- the functions with the ending W. For such a application the system 
code page (native encoding) is completely irrelevant. The system code 
page is just a compatibility feature that enables Windows NT/Vista/7/8 
to run applications that were developed for Windows 95 which didn't have 
unicode support. But this line of operating systems is dead for 10 years 
now. R obviously is a unicode application because it can print - or read 
from the clipboard - characters like "ΔЊת" that are not in my system 
code page which is not possible over the legacy API.


Neither the unicode API nor the legacy API accepts UTF-8. The legacy API 
needs strings encoded according to the active code page and the unicode 
API wants UTF-16. If you have UTF-8 you need to convert it in either to 
the active code page which will loose all characters that are not 
covered by it or convert to UTF-16 and use the unicode functions. But 
this is not the problem, the Windows interface functions of R are 
working quite nicely with unicode already.


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Native characterset is wrong for unicode builds for Windows

2015-02-26 Thread maill...@tlink.de
Am 26.02.2015 um 23:44 schrieb Winston Chang:
> On Thu, Feb 26, 2015 at 2:09 PM, maill...@tlink.de 
>   > wrote:
>
>
> When I send some outlandish characters through enc2native (or
> format) in R 3.1.2 on Ubuntu trusty it works quite well:
>
> > "®ØΔЊת"
> [1] "®ØΔЊת"
> > enc2native("®ØΔЊת")
> [1] "®ØΔЊת"
> > Encoding(enc2native("®ØΔЊת"))
> [1] "UTF-8"
>
> In Windows the result is different:
>
> > "®ØΔЊת"
> [1] "®ØΔЊת"
> > enc2native("®ØΔЊת")
> [1] "®Ø"
> > Encoding(enc2native("®ØΔЊת"))
> [1] "latin1"
>
> And this is wrong. The native character set of a unicode
> application under Windows is *Unicode*. enc2native should do the
> same under Windows as it does on Ubuntu. Also the "unknown"
> encoding should be changed to mean the same as "UTF-8" exactly as
> it is on Linux.
>
>
> I think you're mixing up the term "character set" with the encoding 
> for a character set. Unicode is a character set. UTF-8 is one of many 
> encodings of Unicode.
>
> UTF-8 may be the native character encoding in Ubuntu, but it's not the 
> native encoding in Windows. According to this, what counts as the 
> native encoding in Windows depends on the code page:
> http://stackoverflow.com/a/4649507
>
> So you shouldn't expect enc2native to do the same thing on Linux and 
> Windows. If you really want UTF-8, you can use enc2utf8.
>
> -Winston

Maybe I'm expecting too much but I rather have R not to produce garbage 
like "®Ø" and while I can use enc2utf8 to 
convert from UTF-8 to UTF-8 this does not fix the many places - like 
"format" - where enc2native is used and that are broken because of this.



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] The Environment variables settings in bin/R, why do they ignore environment variables of the same name?

2015-02-26 Thread Saptarshi Guha
Hello,

In installation/R/bin/R i notice

1. R_HOME_DIR is hard coded e.g.
R_HOME_DIR=/usr/local/lib64/R

2. It ignores R_HOME_DIR

echo "WARNING: ignoring environment value of R_HOME"

3. R_SHARE_DIR, R_INCLUDE_DIR and R_DOC_DIR are also hard coded.

Is there a reason why these  settings do not read the values from the
environment variables of the same name (assuming they exist) and
defaulting to these hard coded values in case they dont?

Regards
Saptarshi

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] static pdf vignette

2015-02-26 Thread Wang, Zhu
Dear all,

In my package I have a computational expensive Rnw file which can't pass R CMD 
check. Therefore I set eval=FALSE in the Rnw file. But I would like to have the 
pdf vignette generated by the Rnw file with eval=TRUE. It seems to me a static 
pdf vignette is an option.  Any suggestions on this?

Thanks,

Zhu Wang


**Connecticut Children's Confidentiality Notice**

This e-mail message, including any attachments, is for t...{{dropped:11}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] The Environment variables settings in bin/R, why do they ignore environment variables of the same name?

2015-02-26 Thread Dirk Eddelbuettel

On 26 February 2015 at 16:20, Saptarshi Guha wrote:
| In installation/R/bin/R i notice
| 
| 1. R_HOME_DIR is hard coded e.g.
| R_HOME_DIR=/usr/local/lib64/R
| 
| 2. It ignores R_HOME_DIR
| 
| echo "WARNING: ignoring environment value of R_HOME"
| 
| 3. R_SHARE_DIR, R_INCLUDE_DIR and R_DOC_DIR are also hard coded.
| 
| Is there a reason why these  settings do not read the values from the
| environment variables of the same name (assuming they exist) and
| defaulting to these hard coded values in case they dont?

AFAICR you supposed to deal with this via $PATH and just pick one:

Ie with

  edd@max:~$ grep ^R_HOME_DIR /usr/lib/R/bin/R /usr/local/lib/R-devel/bin/R 

   
  /usr/lib/R/bin/R:R_HOME_DIR=/usr/lib/R
  /usr/local/lib/R-devel/bin/R:R_HOME_DIR=/usr/local/lib/R-devel/lib/R
  edd@max:~$ 

I get, respectively,

  edd@max:~$ R --version | head -1
  R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
  edd@max:~$ PATH="/usr/local/lib/R-devel/bin/:$PATH" R --version | head -1
  R Under development (unstable) (2015-02-22 r67876) -- "Unsuffered 
Consequences"
  edd@max:~$ 

and I have R-devel aliased to RD and R-devel in /usr/local/bin.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Native characterset is wrong for unicode builds for Windows

2015-02-26 Thread Duncan Murdoch
On 26/02/2015 6:34 PM, maill...@tlink.de wrote:
>> On 26/02/2015 3:09 PM, maill...@tlink.de wrote:
>>> When I send some outlandish characters through enc2native (or format) in
>>> R 3.1.2 on Ubuntu trusty it works quite well:
>>>
>>>   > "®ØΔЊת"
>>> [1] "®ØΔЊת"
>>>   > enc2native("®ØΔЊת")
>>> [1] "®ØΔЊת"
>>>   > Encoding(enc2native("®ØΔЊת"))
>>> [1] "UTF-8"
>>>
>>> In Windows the result is different:
>>>
>>>   > "®ØΔЊת"
>>> [1] "®ØΔЊת"
>>>   > enc2native("®ØΔЊת")
>>> [1] "®Ø"
>>>   > Encoding(enc2native("®ØΔЊת"))
>>> [1] "latin1"
>>>
>>> And this is wrong. The native character set of a unicode application
>>> under Windows is *Unicode*. enc2native should do the same under Windows
>>> as it does on Ubuntu. Also the "unknown" encoding should be changed to
>>> mean the same as "UTF-8" exactly as it is on Linux.
>> What is a "unicode application", and what makes you think R is one?  R
>> is being told by Windows that your native encoding is latin1.  Perhaps
>> Windows 8 supports UTF-8 as a native encoding (I've never used it), but
>> previous versions of Windows didn't.
>>
>> Duncan Murdoch
>>
> A unicode application is a program that uses the unicode API of Windows 

R uses those functions, so I guess it is a "unicode application".  But
internally it uses an 8 bit encoding (normally the native one for the
platform it is running on, which in your case is apparently latin1).

> - the functions with the ending W. For such a application the system 
> code page (native encoding) is completely irrelevant. The system code 
> page is just a compatibility feature that enables Windows NT/Vista/7/8 
> to run applications that were developed for Windows 95 which didn't have 
> unicode support. 

Windows 95 had UCS-2 support, which was pretty close to UTF-16.

But this line of operating systems is dead for 10 years
> now. R obviously is a unicode application because it can print - or read 
> from the clipboard - characters like "ΔЊת" that are not in my system 
> code page which is not possible over the legacy API.

So "unicode application" is something you just made up.

If you use Windows development tools, they have macros to convert
generic functions to either A or W versions.  R doesn't use those.  It
calls the W functions when it has UTF-16 characters, and A functions
when it has native characters.  I would love it if R was a UTF-8
application, because it would make life so much simpler, but Windows
doesn't support that.  So R needs to do tons of conversions.  If you
don't like that, you probably need to stick with Ubuntu.

Duncan Murdoch

> 
> Neither the unicode API nor the legacy API accepts UTF-8. The legacy API 
> needs strings encoded according to the active code page and the unicode 
> API wants UTF-16. If you have UTF-8 you need to convert it in either to 
> the active code page which will loose all characters that are not 
> covered by it or convert to UTF-16 and use the unicode functions. But 
> this is not the problem, the Windows interface functions of R are 
> working quite nicely with unicode already.


> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Native characterset is wrong for unicode builds for Windows

2015-02-26 Thread maill...@tlink.de

Am 27.02.2015 um 03:13 schrieb Duncan Murdoch:

On 26/02/2015 6:34 PM, maill...@tlink.de wrote:

On 26/02/2015 3:09 PM, maill...@tlink.de wrote:

When I send some outlandish characters through enc2native (or format) in
R 3.1.2 on Ubuntu trusty it works quite well:

   > "®ØΔЊת"
[1] "®ØΔЊת"
   > enc2native("®ØΔЊת")
[1] "®ØΔЊת"
   > Encoding(enc2native("®ØΔЊת"))
[1] "UTF-8"

In Windows the result is different:

   > "®ØΔЊת"
[1] "®ØΔЊת"
   > enc2native("®ØΔЊת")
[1] "®Ø"
   > Encoding(enc2native("®ØΔЊת"))
[1] "latin1"

And this is wrong. The native character set of a unicode application
under Windows is *Unicode*. enc2native should do the same under Windows
as it does on Ubuntu. Also the "unknown" encoding should be changed to
mean the same as "UTF-8" exactly as it is on Linux.

What is a "unicode application", and what makes you think R is one?  R
is being told by Windows that your native encoding is latin1.  Perhaps
Windows 8 supports UTF-8 as a native encoding (I've never used it), but
previous versions of Windows didn't.

Duncan Murdoch


A unicode application is a program that uses the unicode API of Windows

R uses those functions, so I guess it is a "unicode application".  But
internally it uses an 8 bit encoding (normally the native one for the
platform it is running on, which in your case is apparently latin1).


- the functions with the ending W. For such a application the system
code page (native encoding) is completely irrelevant. The system code
page is just a compatibility feature that enables Windows NT/Vista/7/8
to run applications that were developed for Windows 95 which didn't have
unicode support.

Windows 95 had UCS-2 support, which was pretty close to UTF-16.

But this line of operating systems is dead for 10 years

now. R obviously is a unicode application because it can print - or read
from the clipboard - characters like "ΔЊת" that are not in my system
code page which is not possible over the legacy API.

So "unicode application" is something you just made up.

If you use Windows development tools, they have macros to convert
generic functions to either A or W versions.  R doesn't use those.  It
calls the W functions when it has UTF-16 characters, and A functions
when it has native characters.  I would love it if R was a UTF-8
application, because it would make life so much simpler, but Windows
doesn't support that.  So R needs to do tons of conversions.  If you
don't like that, you probably need to stick with Ubuntu.

Duncan Murdoch



I am not complaining about those conversions. They work just fine 
already. I am complaining about

enc2native breaking things in the windows builds. An assignment like

s <- format("®ØΔЊת")

has no interaction with windows at all yet "s" contains garbage like  
"®Ø"
after that. And if a native encoding of UTF-8 - as defined by enc2native 
- works in Ubuntu why shouldn't it work

in Windows?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel