Re: [Rd] Runnable R packages

2019-02-08 Thread David Lindelof
Yesterday I wrote and submitted to CRAN a package `run`, which implements
the ideas discussed in this thread. Given a package tarball
foo_0.1.0.tar.gz, users will be able to run

Rscript -e "run::run('foo_0.1.0.tar.gz')"

which will pull all the dependencies of package `foo`, lookup a function
`main` in that package's namespace, and call it.

It's an early draft but I'd appreciate any feedback (once its submission is
accepted, of course).

Thanks all for your help and advice,

David

On Sat, Feb 2, 2019 at 3:37 PM Duncan Murdoch 
wrote:

> On 02/02/2019 8:27 a.m., Barry Rowlingson wrote:
> > I don't think anyone denies that you *could* make an EXE to do all
> > that. The discussion is on *how easy* it should be to create a single
> > file that contains an initial "main" function plus a set of bundled
> > code (potentially as a package) and which when run will install its
> > package code (which is contained in itself, its not in a repo),
> > install dependencies, and run the main() function.
> >
> > Now, I could build a self-executable shar file that bundled a package
> > together with a script to do all the above. But if there was a "RUN"
> > command in R, and a convention that a function called "foo::main"
> > would be run by `R CMD RUN foo_1.1.1.tar.gz` then it would be so much
> > easier to develop and test.
>
> I don't believe the "so much easier" argument that this requires a
> change to base R.  If you put that functionality into a package, then
> the only extra effort the user would require is to install that other
> package.  After that, they could run
>
> Rscript -e "yourpackage::run_main('foo_1.1.1.tar.gz')"
>
> as I suggested before.  This is no harder than running
>
> R CMD RUN foo_1.1.1.tar.gz
>
> The advantage of this from R Core's perspective is that you would be
> developing and maintaining "yourpackage", you wouldn't be passing the
> burden on to them.  The advantage from your perspective is that you
> could work with whatever packages you liked.  The "remotes" package has
> almost everything you need so that "yourpackage" could be nearly
> trivial.  You wouldn't need to duplicate it within base R.
>
> Duncan Murdoch
>
> >
> > If people think this adds value, then if they want to offer that value
> > to me as $ or £, I'd consider writing it if their total value was more
> > than my cost
> >
> > Barry
> >
> >
> > On Sat, Feb 2, 2019 at 12:54 AM Abs Spurdle  wrote:
> >>
> >> Further to my previous post,
> >> it would be possible to create an .exe file, say:
> >>
> >> my_r_application.exe
> >>
> >> That starts R, loads your R package(s), calls the R function of your
> choice
> >> and does whatever else you want.
> >>
> >> However, I don't think that it would add much value.
> >> But feel free to correct me if you think that I'm wrong.
> >>
> >>  [[alternative HTML version deleted]]
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-02-08 Thread Rainer M Krug
Sounds interesting. Do you have it on GitHub or similar?

Rainer

> On 8 Feb 2019, at 09:09, David Lindelof  wrote:
> 
> Yesterday I wrote and submitted to CRAN a package `run`, which implements
> the ideas discussed in this thread. Given a package tarball
> foo_0.1.0.tar.gz, users will be able to run
> 
> Rscript -e "run::run('foo_0.1.0.tar.gz')"
> 
> which will pull all the dependencies of package `foo`, lookup a function
> `main` in that package's namespace, and call it.
> 
> It's an early draft but I'd appreciate any feedback (once its submission is
> accepted, of course).
> 
> Thanks all for your help and advice,
> 
> David
> 
> On Sat, Feb 2, 2019 at 3:37 PM Duncan Murdoch 
> wrote:
> 
>> On 02/02/2019 8:27 a.m., Barry Rowlingson wrote:
>>> I don't think anyone denies that you *could* make an EXE to do all
>>> that. The discussion is on *how easy* it should be to create a single
>>> file that contains an initial "main" function plus a set of bundled
>>> code (potentially as a package) and which when run will install its
>>> package code (which is contained in itself, its not in a repo),
>>> install dependencies, and run the main() function.
>>> 
>>> Now, I could build a self-executable shar file that bundled a package
>>> together with a script to do all the above. But if there was a "RUN"
>>> command in R, and a convention that a function called "foo::main"
>>> would be run by `R CMD RUN foo_1.1.1.tar.gz` then it would be so much
>>> easier to develop and test.
>> 
>> I don't believe the "so much easier" argument that this requires a
>> change to base R.  If you put that functionality into a package, then
>> the only extra effort the user would require is to install that other
>> package.  After that, they could run
>> 
>> Rscript -e "yourpackage::run_main('foo_1.1.1.tar.gz')"
>> 
>> as I suggested before.  This is no harder than running
>> 
>> R CMD RUN foo_1.1.1.tar.gz
>> 
>> The advantage of this from R Core's perspective is that you would be
>> developing and maintaining "yourpackage", you wouldn't be passing the
>> burden on to them.  The advantage from your perspective is that you
>> could work with whatever packages you liked.  The "remotes" package has
>> almost everything you need so that "yourpackage" could be nearly
>> trivial.  You wouldn't need to duplicate it within base R.
>> 
>> Duncan Murdoch
>> 
>>> 
>>> If people think this adds value, then if they want to offer that value
>>> to me as $ or £, I'd consider writing it if their total value was more
>>> than my cost
>>> 
>>> Barry
>>> 
>>> 
>>> On Sat, Feb 2, 2019 at 12:54 AM Abs Spurdle  wrote:
 
 Further to my previous post,
 it would be possible to create an .exe file, say:
 
 my_r_application.exe
 
 That starts R, loads your R package(s), calls the R function of your
>> choice
 and does whatever else you want.
 
 However, I don't think that it would add much value.
 But feel free to correct me if you think that I'm wrong.
 
 [[alternative HTML version deleted]]
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, 
UCT), Dipl. Phys. (Germany)

Department of Evolutionary Biology and Environmental Studies
University of Zürich
Office Y34-J-74
Winterthurerstrasse 190
8075 Zürich
Switzerland

Office: +41 (0)44 635 47 64
Cell:   +41 (0)78 630 66 57
email:  rainer.k...@uzh.ch
rai...@krugs.de
Skype: RMkrug

PGP: 0x0F52F982




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-02-08 Thread David Lindelof
Sure, you can find it here:

https://github.com/dlindelof/run


On Fri, Feb 8, 2019 at 9:41 AM Rainer M Krug  wrote:

> Sounds interesting. Do you have it on GitHub or similar?
>
> Rainer
>
> On 8 Feb 2019, at 09:09, David Lindelof  wrote:
>
> Yesterday I wrote and submitted to CRAN a package `run`, which implements
> the ideas discussed in this thread. Given a package tarball
> foo_0.1.0.tar.gz, users will be able to run
>
> Rscript -e "run::run('foo_0.1.0.tar.gz')"
>
> which will pull all the dependencies of package `foo`, lookup a function
> `main` in that package's namespace, and call it.
>
> It's an early draft but I'd appreciate any feedback (once its submission is
> accepted, of course).
>
> Thanks all for your help and advice,
>
> David
>
> On Sat, Feb 2, 2019 at 3:37 PM Duncan Murdoch 
> wrote:
>
> On 02/02/2019 8:27 a.m., Barry Rowlingson wrote:
>
> I don't think anyone denies that you *could* make an EXE to do all
> that. The discussion is on *how easy* it should be to create a single
> file that contains an initial "main" function plus a set of bundled
> code (potentially as a package) and which when run will install its
> package code (which is contained in itself, its not in a repo),
> install dependencies, and run the main() function.
>
> Now, I could build a self-executable shar file that bundled a package
> together with a script to do all the above. But if there was a "RUN"
> command in R, and a convention that a function called "foo::main"
> would be run by `R CMD RUN foo_1.1.1.tar.gz` then it would be so much
> easier to develop and test.
>
>
> I don't believe the "so much easier" argument that this requires a
> change to base R.  If you put that functionality into a package, then
> the only extra effort the user would require is to install that other
> package.  After that, they could run
>
> Rscript -e "yourpackage::run_main('foo_1.1.1.tar.gz')"
>
> as I suggested before.  This is no harder than running
>
> R CMD RUN foo_1.1.1.tar.gz
>
> The advantage of this from R Core's perspective is that you would be
> developing and maintaining "yourpackage", you wouldn't be passing the
> burden on to them.  The advantage from your perspective is that you
> could work with whatever packages you liked.  The "remotes" package has
> almost everything you need so that "yourpackage" could be nearly
> trivial.  You wouldn't need to duplicate it within base R.
>
> Duncan Murdoch
>
>
> If people think this adds value, then if they want to offer that value
> to me as $ or £, I'd consider writing it if their total value was more
> than my cost
>
> Barry
>
>
> On Sat, Feb 2, 2019 at 12:54 AM Abs Spurdle  wrote:
>
>
> Further to my previous post,
> it would be possible to create an .exe file, say:
>
> my_r_application.exe
>
> That starts R, loads your R package(s), calls the R function of your
>
> choice
>
> and does whatever else you want.
>
> However, I don't think that it would add much value.
> But feel free to correct me if you think that I'm wrong.
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc
> (Conservation Biology, UCT), Dipl. Phys. (Germany)
>
> Department of Evolutionary Biology and Environmental Studies
> University of Zürich
> Office Y34-J-74
> Winterthurerstrasse 190
> 8075 Zürich
> Switzerland
>
> Office: +41 (0)44 635 47 64
> Cell:+41 (0)78 630 66 57
> email:  rainer.k...@uzh.ch 
> rai...@krugs.de
> Skype: RMkrug
>
> PGP: 0x0F52F982
>
>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

2019-02-08 Thread Tomas Kalibera
I can reproduce this behavior on my Windows 10 system in RGui (cp1252): 
when I paste the Unicode infinity symbol into the console, it is treated 
as number 8. This is caused by Windows "best fit" default behavior in 
conversion of unicode characters to characters in the current native 
encoding: at some point in the past, 8 has been chosen as a good fit for 
infinity in Windows. In my scenario, the conversion is invoked by RGui 
before returning the input to the main R loop, even before the input 
gets to the parser. In principle, we could change this particular 
conversion in RGui to avoid the substitution. RGui uses "\u" escapes 
to pass characters that cannot be represented, this is why e.g. the 
Cyrillic Zhe \u0436 worked, so we could tell Windows not to do the 
substitution and pass "\u221e" for Infinity, and then the string after 
being processed by the parser will be represented in UTF-8 inside R and 
could be e.g. printed by the RGui console. That is something that could 
be considered, but it will not solve the main problem and it may 
actually cause trouble to users who are used to such substitutions 
(especially when the substitutions are more intuitive, but, that may be 
a matter of opinion).


The main problem is that in normal use, sooner or later R will get to 
the point when it will need to do the conversion to native encoding, and 
in some context where "\u" escapes will not be possible. One cannot 
reliably work with strings in R that cannot be represented in the 
current native encoding (except when one knows precisely how to avoid 
the conversion in some specific task, but that may be brittle; so the 
best-fit substitution might in principle help here). This problem does 
not exist on Unix/macOS systems where the current native encoding is 
UTF-8 these days, so today it only exists on Windows where UTF-8 cannot 
be the current native encoding. As has been discussed before, even 
though we could rewrite in principle all calls to Windows API to use 
Unicode and have all strings in UTF-8 in R, we would still have problems 
when interfacing with packages that assume strings are in current native 
encoding (without checking), so this problem won't be easy to fix.


Best,
Tomas

On 2/7/19 3:10 PM, Daniel Possenriede wrote:

There seems to be something odd with "∞" on Windows (and not only with
read.table)
In native encoding (cp-1252 in my case), "∞" gets converted to "8"

x <-  "∞"
Encoding(x)
#> [1] "unknown"
print(x)
#> [1] "8"
charToRaw(x)
#> [1] 38

"∞" is indeed "8"

identical(x, "8")
#> [1] TRUE

Everything seems fine if  "∞" is UTF-8 encoded.

y <- "\u221E"
Encoding(y)
#> [1] "UTF-8"
print(y)
#> [1]  "∞"
charToRaw(y)
#> [1] e2 88 9e

Unless the string is converted back to native encoding.

format(y)
#> [1] "8"

This ought to be "", equivalently to

format("∝")
#> [1] ""

Session Info:

si <- sessionInfo()
si$running
#> [1] "Windows 10 x64 (build 17134)"
si$R.version$version.string
#> [1] "R version 3.5.2 (2018-12-20)"
si$locale
#> [1]
"LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252"



Am Do., 7. Feb. 2019 um 14:33 Uhr schrieb David Byrne <
david.byrne...@gmail.com>:


I can confirm that it doesn't happen on Ubuntu 18.04.1 so Peter is
most likely correct; it looks like its Windows specific.

On Thu, 7 Feb 2019 at 12:55, peter dalgaard  wrote:

This doesn't seem to be happening on MacOS, neither in Terminal nor

RStudio, (R 3.5.1, R-devel, R-patched). So probably Windows specific.

-pd


On 7 Feb 2019, at 11:17 , David Byrne 

wrote:

Bug
Using read.table(file, encoding="UTF-8") to import a UTF-8 encoded
file containing the infinity symbol (' ∞ ') results in the infinity
symbol imported as the number 8. Other Unicode characters seem
unaffected, example, Zhe: ж

Expected Behavior:
The imported data.frame should represent the infinity symbol as the
expected 'Inf' so that normal mathematical operations can be processed

Stack Overflow Post:
I created a question on Stack Overflow where one other member was able
to reproduce the same issues I was having. This question can be found
at:


https://stackoverflow.com/questions/54522196/r-read-table-with-utf-8-encoded-file-reads-infinity-symbol-as-8-int

Method to Reproduce - 1:
A simple method to reproduce this issues is to use R-Studio: In the
console, type the following:

read.table(text=" ∞", encoding="UTF-8")

The result should be a data.frame with a single value of '8'

Repeating the same with ж Results in correct expected behavior

Method to Reproduce - 2:
Create a .csv file containing the infinity and Zhe characters (I have
attached the file for convenience, hopefully it is no rejected by your
email service). Launch an interactive session using


r --vanilla

Enter the following statement taking care to replace the
 with the appropriate one:


read.table("/unicode_chars.csv", sep=",",

encoding="UTF-8")


This should result in a two element data.

Re: [Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

2019-02-08 Thread peter dalgaard
Fortune nomination...

> On 8 Feb 2019, at 13:07 , Tomas Kalibera  wrote:
> 
> This is caused by Windows "best fit" default behavior in conversion of 
> unicode characters to characters in the current native encoding: at some 
> point in the past, 8 has been chosen as a good fit for infinity in Windows.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] nlminb with constraints failing on some platforms

2019-02-08 Thread ProfJCNash
It may be worth noting that both Avraham and I are members of the

histoRicalg project
(https://gitlab.com/nashjc/histoRicalg) that has some modest funding
from R-Consortium.
The type of concern this nlminb thread raises is why the project was
proposed. That is,
older codes that may predate IEEE arithmetic and modern programming
language processors
often were built with a different understanding of how algorithm
expressions would be
executed.

Documenting the resolution of this issue and others like it will be
welcome and we will hope
to be able to collect such results in a form that may help resolve
similar matters in future.

Best, JN


On 2019-02-06 7:15 a.m., Avraham Adler wrote:

> If it helps, the BLAS I used is compiled to use 6 threads.
>
> On Wed, Feb 6, 2019 at 3:47 AM Berend Hasselman  wrote:
>
>>> On 6 Feb 2019, at 10:58, Martin Maechler 
>> wrote:
>> .
>> ---
>>> I summarize what has been reported till:
>>>
>>> Failure in these cases
>>> 
>>> 1. Kasper K ("Scientific Linux", self compiled R, using Intel's MKL
>>>   for BLAS/LAPACK)
>>> 2. (By Bill Dunlap): Microsoft R Open (MRO) 3.4.2, also using
>>>   MKL with 12 cores
>>> 3. (By Brad Bell)  : R 3.5.2 Fedora 28 (x86_64) pkg, OpenBLAS(?)
>>> 4. (by MM) : R 3.5.2 Fedora 28 (x86_64) pkg, BLAS+Lapack =
>> OpenBLAS
>>> Success
>>> ===
>>>
>>> - (by MM): R-devel, R 3.5.2 patched on FC28, *self compiled* gcc
>> 8.2,
>>>using R's BLAS/Lapack
>>> - (by Ralf Stubner): R 3.5.2 from Debian Stable (gcc 6.2) + OpenBLAS
>>> - (by Berend H.)   : R 3.5.2 [from CRAN] on macOS 10.14.3 (BLAS/Lapack
>> ??)
>>
>> R 3.5.2 from CRAN using R's BLAS/Lapack.
>>
>> Berend
>>
>> 
>>
>>> It would be great if this could be solved...
>>>
>>> Martin
>>>
>>>
>>>
 I have tried passing in the gradient and turning on the trace and it
>> gives nearly the exact same trace with and without the gradient.
>>>[...]
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

2019-02-08 Thread Daniel Possenriede
Tomas,

> In my scenario, the conversion is invoked by RGui before returning the
input to the main R loop, even before the input gets to the parser. In
principle, we could change this particular conversion in RGui to avoid the
substitution.

Not sure whether I am missing something here, but I used RStudio for my
examples (I should have said) and David's mentioned RStudio as well, so it
does not seem to be a problem with RGui only.

Another example for the "best fit" behaviour seems to be "Σ"
("\u03A3", greek capital letter sigma, not "\u2211", n-ary summation):

print("Σ")
#> [1] "S"

Again with cp1252 on Windows 10, R 3.5.2, RStudio 1.2.1256 preview.

> even though we could rewrite in principle all calls to Windows API to use
Unicode and have all strings in UTF-8 in R, we would still have problems
when interfacing with packages that assume strings are in current native
encoding (without checking), so this problem won't be easy to fix.

Since I regularly encounter the reverse problem, i.e. packages that assume
strings are in UTF-8 encoding without checking (which isn't very
surprising, assuming that most package developers develop on Unix/macOS
systems), I'd say, "rip of the bandaid rather sooner than later". Obviously
I don't know how many bugs would surface in packages if R for Windows'
native encoding were to switch to UTF-8, but these bugs would only be
transitory, I suppose. Whereas there is a steady inflow of
assume-UTF-8-encoding-bugs in new packages and functions with the current
situation.

Best,
Daniel


Am Fr., 8. Feb. 2019 um 13:07 Uhr schrieb Tomas Kalibera <
tomas.kalib...@gmail.com>:

> I can reproduce this behavior on my Windows 10 system in RGui (cp1252):
> when I paste the Unicode infinity symbol into the console, it is treated
> as number 8. This is caused by Windows "best fit" default behavior in
> conversion of unicode characters to characters in the current native
> encoding: at some point in the past, 8 has been chosen as a good fit for
> infinity in Windows. In my scenario, the conversion is invoked by RGui
> before returning the input to the main R loop, even before the input
> gets to the parser. In principle, we could change this particular
> conversion in RGui to avoid the substitution. RGui uses "\u" escapes
> to pass characters that cannot be represented, this is why e.g. the
> Cyrillic Zhe \u0436 worked, so we could tell Windows not to do the
> substitution and pass "\u221e" for Infinity, and then the string after
> being processed by the parser will be represented in UTF-8 inside R and
> could be e.g. printed by the RGui console. That is something that could
> be considered, but it will not solve the main problem and it may
> actually cause trouble to users who are used to such substitutions
> (especially when the substitutions are more intuitive, but, that may be
> a matter of opinion).
>
> The main problem is that in normal use, sooner or later R will get to
> the point when it will need to do the conversion to native encoding, and
> in some context where "\u" escapes will not be possible. One cannot
> reliably work with strings in R that cannot be represented in the
> current native encoding (except when one knows precisely how to avoid
> the conversion in some specific task, but that may be brittle; so the
> best-fit substitution might in principle help here). This problem does
> not exist on Unix/macOS systems where the current native encoding is
> UTF-8 these days, so today it only exists on Windows where UTF-8 cannot
> be the current native encoding. As has been discussed before, even
> though we could rewrite in principle all calls to Windows API to use
> Unicode and have all strings in UTF-8 in R, we would still have problems
> when interfacing with packages that assume strings are in current native
> encoding (without checking), so this problem won't be easy to fix.
>
> Best,
> Tomas
>
> On 2/7/19 3:10 PM, Daniel Possenriede wrote:
> > There seems to be something odd with "∞" on Windows (and not only with
> > read.table)
> > In native encoding (cp-1252 in my case), "∞" gets converted to "8"
> >
> > x <-  "∞"
> > Encoding(x)
> > #> [1] "unknown"
> > print(x)
> > #> [1] "8"
> > charToRaw(x)
> > #> [1] 38
> >
> > "∞" is indeed "8"
> >
> > identical(x, "8")
> > #> [1] TRUE
> >
> > Everything seems fine if  "∞" is UTF-8 encoded.
> >
> > y <- "\u221E"
> > Encoding(y)
> > #> [1] "UTF-8"
> > print(y)
> > #> [1]  "∞"
> > charToRaw(y)
> > #> [1] e2 88 9e
> >
> > Unless the string is converted back to native encoding.
> >
> > format(y)
> > #> [1] "8"
> >
> > This ought to be "", equivalently to
> >
> > format("∝")
> > #> [1] ""
> >
> > Session Info:
> >
> > si <- sessionInfo()
> > si$running
> > #> [1] "Windows 10 x64 (build 17134)"
> > si$R.version$version.string
> > #> [1] "R version 3.5.2 (2018-12-20)"
> > si$locale
> > #> [1]
> >
> "LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252"

Re: [Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

2019-02-08 Thread Tomas Kalibera


I can reproduce with read.table(encoding="UTF-8") in RGui on Windows 10, 
reading a file containing the two UTF-8 characters. The table is read 
correctly into R as documented (both characters are represented in UTF-8 
and marked as such), but, the conversion of Infinity to 8 and of Zhe to 
 happens later during printing using print.data.frame(). For 
instance, it currently does not happen during print(as.matrix()). As I 
wrote in more detail in another email in this thread, R sometimes needs 
to convert strings to the current native encoding, Windows converts 
Infinity to 8 by default as "best fit", but fails to convert Zhe, so R 
displays the .

It is easiest to only use input files in current native encoding, so one 
could convert before passing them to R and make sure the conversion does 
not have similar problems...  or use R on a non-Windows platform. 
Relying on which R functions/packages can work with non-native encodings 
may be brittle, but of course any R function that documents to work with 
non-native encodings (like read.table(encoding=)) should do so. If not, 
it will be fixed following a bug report.

I am not sure if that is what you had in mind, but conversion of 
character (string) to double is a different matter. as.double() now as 
documented in ?as.double returns NA for "∞" (on Linux).

Best
Tomas


On 2/7/19 11:17 AM, David Byrne wrote:
> Bug
> Using read.table(file, encoding="UTF-8") to import a UTF-8 encoded
> file containing the infinity symbol (' ∞ ') results in the infinity
> symbol imported as the number 8. Other Unicode characters seem
> unaffected, example, Zhe: ж
>
> Expected Behavior:
> The imported data.frame should represent the infinity symbol as the
> expected 'Inf' so that normal mathematical operations can be processed
>
> Stack Overflow Post:
> I created a question on Stack Overflow where one other member was able
> to reproduce the same issues I was having. This question can be found
> at:
> https://stackoverflow.com/questions/54522196/r-read-table-with-utf-8-encoded-file-reads-infinity-symbol-as-8-int
>
> Method to Reproduce - 1:
> A simple method to reproduce this issues is to use R-Studio: In the
> console, type the following:
>> read.table(text=" ∞", encoding="UTF-8")
> The result should be a data.frame with a single value of '8'
>
> Repeating the same with ж Results in correct expected behavior
>
> Method to Reproduce - 2:
> Create a .csv file containing the infinity and Zhe characters (I have
> attached the file for convenience, hopefully it is no rejected by your
> email service). Launch an interactive session using
>
>> r --vanilla
> Enter the following statement taking care to replace the
>  with the appropriate one:
>
>> read.table("/unicode_chars.csv", sep=",", encoding="UTF-8")
>
> This should result in a two element data.frame; the first being the
> incorrect value of 8 with an additional  and the second the
> correct value of Zhe.
>
> Note the additional  prefixed to the front of the '8'. This
> appears to be a hidden character for the purposes of letting editors
> know the encoding. The following link has some explanation however, it
> states this is caused by excel. The file I created was done so using
> notepad and not Excel.
>
> https://medium.freecodecamp.org/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7
>
> System Details:
> OS:
>> Windows 10.0.17134 Build 17134
>
> R Version:
>> platform   x86_64-w64-mingw32
>> arch   x86_64
>> os mingw32
>> system x86_64, mingw32
>> status
>> major  3
>> minor  4.1
>> year   2017
>> month  06
>> day30
>> svn rev72865
>> language   R
>> version.string R version 3.4.1 (2017-06-30)
>> nickname   Single Candle
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bug Report: read.table with UTF-8 encoded file imports infinity symbol as Integer 8

2019-02-08 Thread Duncan Murdoch

On 08/02/2019 11:12 a.m., Daniel Possenriede wrote:

Tomas,


In my scenario, the conversion is invoked by RGui before returning the

input to the main R loop, even before the input gets to the parser. In
principle, we could change this particular conversion in RGui to avoid the
substitution.

Not sure whether I am missing something here, but I used RStudio for my
examples (I should have said) and David's mentioned RStudio as well, so it
does not seem to be a problem with RGui only.

Another example for the "best fit" behaviour seems to be "Σ"
("\u03A3", greek capital letter sigma, not "\u2211", n-ary summation):

print("Σ")
#> [1] "S"

Again with cp1252 on Windows 10, R 3.5.2, RStudio 1.2.1256 preview.


even though we could rewrite in principle all calls to Windows API to use

Unicode and have all strings in UTF-8 in R, we would still have problems
when interfacing with packages that assume strings are in current native
encoding (without checking), so this problem won't be easy to fix.

Since I regularly encounter the reverse problem, i.e. packages that assume
strings are in UTF-8 encoding without checking (which isn't very
surprising, assuming that most package developers develop on Unix/macOS
systems), I'd say, "rip of the bandaid rather sooner than later". Obviously
I don't know how many bugs would surface in packages if R for Windows'
native encoding were to switch to UTF-8, but these bugs would only be
transitory, I suppose. Whereas there is a steady inflow of
assume-UTF-8-encoding-bugs in new packages and functions with the current
situation.


Just one minor comment:  it is *impossible* for R for Windows "native" 
encoding to switch to UTF-8, since Windows doesn't support that.  The 
necessary change (which I'd support, but it's a really large amount of 
work) would be for R to drop its use of native encodings internally. 
Convert everything to UTF-8 on the way in, convert to native on the way out.


This is a large amount of work because R has preferred native encodings 
basically forever, so there are tons of locations needing changes, and a 
large effort would be required to make them.  It would likely be easier 
for Windows to add UTF-8 as a native encoding.  Converting between that 
and Windows internal UTF-16 is nearly trivial, much easier than many of 
the conversions it does.  And Microsoft has revenues of $90 billion per 
year, while R Core only has a few individuals donating their time:  so 
wouldn't it make more sense to ask them to act like responsible members 
of the computing community?


Duncan Murdoch



Best,
Daniel


Am Fr., 8. Feb. 2019 um 13:07 Uhr schrieb Tomas Kalibera <
tomas.kalib...@gmail.com>:


I can reproduce this behavior on my Windows 10 system in RGui (cp1252):
when I paste the Unicode infinity symbol into the console, it is treated
as number 8. This is caused by Windows "best fit" default behavior in
conversion of unicode characters to characters in the current native
encoding: at some point in the past, 8 has been chosen as a good fit for
infinity in Windows. In my scenario, the conversion is invoked by RGui
before returning the input to the main R loop, even before the input
gets to the parser. In principle, we could change this particular
conversion in RGui to avoid the substitution. RGui uses "\u" escapes
to pass characters that cannot be represented, this is why e.g. the
Cyrillic Zhe \u0436 worked, so we could tell Windows not to do the
substitution and pass "\u221e" for Infinity, and then the string after
being processed by the parser will be represented in UTF-8 inside R and
could be e.g. printed by the RGui console. That is something that could
be considered, but it will not solve the main problem and it may
actually cause trouble to users who are used to such substitutions
(especially when the substitutions are more intuitive, but, that may be
a matter of opinion).

The main problem is that in normal use, sooner or later R will get to
the point when it will need to do the conversion to native encoding, and
in some context where "\u" escapes will not be possible. One cannot
reliably work with strings in R that cannot be represented in the
current native encoding (except when one knows precisely how to avoid
the conversion in some specific task, but that may be brittle; so the
best-fit substitution might in principle help here). This problem does
not exist on Unix/macOS systems where the current native encoding is
UTF-8 these days, so today it only exists on Windows where UTF-8 cannot
be the current native encoding. As has been discussed before, even
though we could rewrite in principle all calls to Windows API to use
Unicode and have all strings in UTF-8 in R, we would still have problems
when interfacing with packages that assume strings are in current native
encoding (without checking), so this problem won't be easy to fix.

Best,
Tomas

On 2/7/19 3:10 PM, Daniel Possenriede wrote:

There seems to be something odd with "∞" on Windows (and not only

Re: [Rd] PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()

2019-02-08 Thread Henrik Bengtsson
Bumping this thread in the hope to catch the attention from R core.

As I try to argue in my original post, given the existing internal
structure of connections, I don't think it's too hard to add protection
against the use of corrupted R connection.

Writing to corrupted connection is a mistake that currently may pass
silently while corrupting a non-intended target.

Henrik

On Tue, Oct 30, 2018, 19:51 Henrik Bengtsson  SUMMARY:
>
> I'm proposing that R assert that 'connection' options have not changed
> since first created such that R will produce the following error:
>
> > fh <- file("a.txt", open = "w+")
> > cat("hello\n", file = fh)
> > close(fh)
>
> > fh2 <- file("b.txt", open = "w+")
> > cat("world\n", file = fh2)
>
> > cat("hello again\n", file = fh)
> Error in cat("hello again\n", file = fh) :
>   invalid connection (non-existing 'conn_id')
>
> Note that, currently in R, the latter silently writes to 'b.txt' - not
> 'a.txt' (for more details, see
> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81).
>
>
> BACKGROUND:
>
> In R, connections are indexed by their (zero-based) row indices in the
> table of available connections.  For example,
>
> > fh <- file("a.txt", open = "w")
> > showConnections(all = TRUE)
>   description class  mode text   isopen   can read can write
> 0 "stdin" "terminal" "r"  "text" "opened" "yes""no"
> 1 "stdout""terminal" "w"  "text" "opened" "no" "yes"
> 2 "stderr""terminal" "w"  "text" "opened" "no" "yes"
> 3 "a.txt" "file" "w"  "text" "opened" "no" "yes"
> > con <- getConnection(3)
> > identical(con, fh)
> [1] TRUE
>
>
> ISSUE:
>
> The problem with the current design/implementation where connections
> are referred to by their index (only), is that
>
> (i) the table of connections changes over time and
> (ii) connection indices are recycled.
>
> Because a `connection` object holds the connection row index, it means
> that *the actual underlying connection that a `connection` object
> refers to may change over its lifetime*.
>
>
> SUGGESTION:
>
> Make use of the 'Rconn' struct field 'id', which is unique, to assert
> that the 'connection' object used is referring to the
> original/expected connection.  The 'id' field is available via
> attribute 'conn_id' part of a 'connection' object.
>
>
> PATCH:
>
> See attached 'connection.patch' file (or
>
> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81#issuecomment-434210222
> ).
> The patch introduces a new SEXP R_GetConnection2(SEXP sConn) function,
> which looks up a connection by its index *and* the 'id' field. This
> function is backward compatible with R_GetConnection(), which looks up
> a connection by its index (only). In addition, R_GetConnection2() also
> accepts 'sConn' of type integer, which the looks up the connection
> similar to how the internal getConnection() function does it.
>
> Comment: The patch is just one of many alternatives.  Hopefully, it
> helps clarify what I'm suggesting.  It passes 'make check' and I've
> tested it on a few packages of mine that make heavy use of different
> types of connections.
>
> In addition to "overridden" connections, the patch protects against
> invalid 'connection':s that have been serialized, e.g.
>
> > fh2 <- file("b.txt", open = "w+")
> > saveRDS(fh2, file = "fh2.rds")
> > fh3 <- readRDS("fh2.rds")
> > attr(fh2, "conn_id")
> 
> > attr(fh3, "conn_id")
>   #<== NIL because external pointer was lost when
> serialized
> > isOpen(fh2)
> [1] TRUE
> > isOpen(fh3)
> Error in isOpen(fh3) : invalid connection ('conn_id' is NULL)
>
> This is useful, when for instance 'connection':s are (incorrectly)
> passed to background R sessions (e.g. PSOCK cluster nodes).
>
>
> SEE ALSO:
>
> * More details of the above are scribbled down on
> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81
> * R-devel post 'closeAllConnections() can really mess things up',
> 2016-10-30,
> https://stat.ethz.ch/pipermail/r-devel/2016-October/073331.html
>
> All the best,
>
> Henrik
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Runnable R packages

2019-02-08 Thread Abs Spurdle
I'm not sure whether GCC is in Rtools or not.
I will check on Monday.

However, that's not the main point.
In Rtools, there's nothing like the following:

R CMD Rpkg2exe -o my_r_application.exe my_r_package

or

R CMD Rpkg2exe -o my_r_application.exe my_r_package_0.1.0.tar.gz

Which would convert an R package into an executable file.


On Thu, Feb 7, 2019 at 9:38 PM Peter Meissner 
wrote:

> Doesn't Rtools provide everything needed to build R packages and R on
> Windows - including gcc?
>
>>
>>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel