Re: [Rd] base::order breaking change in R-devel

2020-06-23 Thread Tomas Kalibera

This can be narrowed down to

Sys.setlocale("LC_CTYPE","C")
x2 <- "\u00e7"
x1 <- iconv(x2, from="UTF-8", to="latin1")
x1 < x2 # FALSE or NA

In R 4.0 it returns NA, in R-devel it returns FALSE (when running in 
CP1252 locale on Windows).


It is the same character, only the encoding is different, so the R-devel 
return value is correct and the previous behavior was a bug. It should 
not matter what is the current native encoding when doing the 
comparison. Also, the collation order should only apply after characters 
are converted to a common encoding, when the encoding is known, so in 
this case the collation order of the locale should not have an impact, 
and it seems it doesn't. I don't think R should preserve 
bug-compatibility in this case, code depending on this buggy behavior 
should be fixed.


I don't see immediately which NEWS entry this corresponds to. Please 
keep in mind that NEWS don't cover all changes, for that you need to 
look at the svn commits, and even then it may be hard to track down 
concrete changes in behavior to the commits, to do that you need to 
debug the code or bisect.


Changes to _documented_ behavior should be more visible and of course 
reflected by changes in the documentation, if not, it is a bug worth 
reporting,  and the report should come with a reference to concrete 
parts of the documentation that is violated.


Best
Tomas

On 5/23/20 12:03 PM, Jan Gorecki wrote:

Hi R developers,
There seems to be breaking change in base::order on Windows in
R-devel. Code below yields different results on R 4.0.0 and R-devel
(2020-05-22 r78545). I haven't found any info about that change in
NEWS. Was the change intentional?

Sys.setlocale("LC_CTYPE","C")
Sys.setlocale("LC_COLLATE","C")
x1 = "fa\xE7ile"
Encoding(x1) = "latin1"
x2 = iconv(x1, "latin1", "UTF-8")
base::order(c(x2,x1,x1,x2))
Encoding(x2) = "unknown"
base::order(c(x2,x1,x1,x2))

# R 4.0.0
base::order(c(x2,x1,x1,x2))
#[1] 1 4 2 3
Encoding(x2) = "unknown"
base::order(c(x2,x1,x1,x2))
#[1] 2 3 1 4

# R-devel
base::order(c(x2,x1,x1,x2))
#[1] 1 2 3 4
Encoding(x2) = "unknown"
base::order(c(x2,x1,x1,x2))
#[1] 1 4 2 3

Best Regards,
Jan Gorecki

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-ints context documentation

2020-06-23 Thread Tomas Kalibera
Thanks for spotting this outdated bit in the documentation. Updated now 
in R-devel. The byte-code compiler does additional optimizations - the 
contexts are not included when not needed, and source 
references/expressions are tracked in a different way. That is 
documented in the compiler documentation.


Best,
Tomas

On 5/27/20 3:07 AM, brodie gaslam via R-devel wrote:

In 1.4 Contexts[1], should the following:


Note that whilst calls to closures and builtins set a context,
those to special internal functions never do.

Be something like:


Note that whilst calls to closures always set a context,
those to builtins only set a context under profiling
or if they are of the foreign variety (e.g `.C` and similar),
and those to special internal functions never do.

Based on the 'eval.c' source[2].

Best,

Brodie

[1]: https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Contexts
[2]: https://github.com/wch/r-source/blob/tags/R-4-0-0/src/main/eval.c#L821

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] R 4.0.2 is released

2020-06-23 Thread Hasan Diwan
Congrats! -- H

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] subset data.frame at C level

2020-06-23 Thread Jim Hester
It looks to me like internally .subset2 uses `get1index()`, but this
function is declared in Defn.h, which AFAIK is not part of the exported R
API.

 Looking at the code for `get1index()` it looks like it just loops over the
(translated) names, so I guess I just do that [0].

[0]:
https://github.com/r-devel/r-svn/blob/1ff1d4197495a6ee1e1d88348a03ff841fd27608/src/main/subscript.c#L226-L235

On Wed, Jun 17, 2020 at 6:11 AM Morgan Morgan 
wrote:

> Hi,
>
> Hope you are well.
>
> I was wondering if there is a function at C level that is equivalent to
> mtcars$carb or .subset2(mtcars, "carb").
>
> If I have the index of the column then the answer would be VECTOR_ELT(df,
> asInteger(idx)) but I was wondering if there is a way to do it directly
> from the name of the column without having to loop over columns names to
> find the index?
>
> Thank you
> Best regards
> Morgan
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Restrict package to load-only access - prevent attempts to attach it

2020-06-23 Thread Henrik Bengtsson
Hi,

I'm developing a package whose API is only meant to be used in other
packages via imports or pkg::foo().  There should be no need to attach
this package so that its API appears on the search() path. As a
maintainer, I want to avoid having it appear in search() conflicts by
mistake.

This means that, for instance, other packages should declare this
package under 'Imports' or 'Suggests' but never under 'Depends'.  I
can document this and hope that's how it's going to be used.  But, I'd
like to make it explicit that this API should be used via imports or
::.  One approach I've considered is:

.onAttach <- function(libname, pkgname) {
   if (nzchar(Sys.getenv("R_CMD"))) return()
   stop("Package ", sQuote(pkgname), " must not be attached")
}

This would produce an error if the package is attached.  It's
conditioned on the environment variable 'R_CMD' set by R itself
whenever 'R CMD ...' runs.  This is done to avoid errors in 'R CMD
INSTALL' and 'R CMD check' "load tests", which formally are *attach*
tests.  The above approach passes all the tests and checks I'm aware
of and on all platforms.

Before I ping the CRAN team explicitly, does anyone know whether this
is a valid approach?  Do you know if there are alternatives for
asserting that a package is never attached.  Maybe this is more
philosophical where the package "contract" is such that all packages
should be attachable and, if not, then it's not a valid R package.

This is a non-critical topic but if it can be done it would be useful.

Thanks,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Restrict package to load-only access - prevent attempts to attach it

2020-06-23 Thread Duncan Murdoch

On 23/06/2020 4:21 p.m., Henrik Bengtsson wrote:

Hi,

I'm developing a package whose API is only meant to be used in other
packages via imports or pkg::foo().  There should be no need to attach
this package so that its API appears on the search() path. As a
maintainer, I want to avoid having it appear in search() conflicts by
mistake.

This means that, for instance, other packages should declare this
package under 'Imports' or 'Suggests' but never under 'Depends'.  I
can document this and hope that's how it's going to be used.  But, I'd
like to make it explicit that this API should be used via imports or
::.  One approach I've considered is:

.onAttach <- function(libname, pkgname) {
if (nzchar(Sys.getenv("R_CMD"))) return()
stop("Package ", sQuote(pkgname), " must not be attached")
}

This would produce an error if the package is attached.  It's
conditioned on the environment variable 'R_CMD' set by R itself
whenever 'R CMD ...' runs.  This is done to avoid errors in 'R CMD
INSTALL' and 'R CMD check' "load tests", which formally are *attach*
tests.  The above approach passes all the tests and checks I'm aware
of and on all platforms.

Before I ping the CRAN team explicitly, does anyone know whether this
is a valid approach?  Do you know if there are alternatives for
asserting that a package is never attached.  Maybe this is more
philosophical where the package "contract" is such that all packages
should be attachable and, if not, then it's not a valid R package.

This is a non-critical topic but if it can be done it would be useful.


Speaking from the philosophical side, I think this is probably a bad 
idea.  Presumably you have some idea of how your package will be used, 
but in my experience, really interesting things happen when such 
assumptions aren't met, and people use code in different ways.


So I'd prefer that you didn't try to prevent me from using your package 
in some weird way.  It's fine if you document that it's intended to be 
used in some particular way, but why try to prevent me from using it 
differently?  Just tell me to read the docs when problems arise because 
of my misuse and I ask you for help.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Restrict package to load-only access - prevent attempts to attach it

2020-06-23 Thread Abby Spurdle
You could go one step down, print a note or a warning.

Also, you could combine different approaches:
Check for an (additional) environment variable.
If set, print a note, if not set, generate a warning (or an error).

That would prevent someone accidently attaching your package, and
would discourage them from doing it.
But would also allow people to attach your package, if they really want to.


On Wed, Jun 24, 2020 at 8:21 AM Henrik Bengtsson
 wrote:
>
> Hi,
>
> I'm developing a package whose API is only meant to be used in other
> packages via imports or pkg::foo().  There should be no need to attach
> this package so that its API appears on the search() path. As a
> maintainer, I want to avoid having it appear in search() conflicts by
> mistake.
>
> This means that, for instance, other packages should declare this
> package under 'Imports' or 'Suggests' but never under 'Depends'.  I
> can document this and hope that's how it's going to be used.  But, I'd
> like to make it explicit that this API should be used via imports or
> ::.  One approach I've considered is:
>
> .onAttach <- function(libname, pkgname) {
>if (nzchar(Sys.getenv("R_CMD"))) return()
>stop("Package ", sQuote(pkgname), " must not be attached")
> }
>
> This would produce an error if the package is attached.  It's
> conditioned on the environment variable 'R_CMD' set by R itself
> whenever 'R CMD ...' runs.  This is done to avoid errors in 'R CMD
> INSTALL' and 'R CMD check' "load tests", which formally are *attach*
> tests.  The above approach passes all the tests and checks I'm aware
> of and on all platforms.
>
> Before I ping the CRAN team explicitly, does anyone know whether this
> is a valid approach?  Do you know if there are alternatives for
> asserting that a package is never attached.  Maybe this is more
> philosophical where the package "contract" is such that all packages
> should be attachable and, if not, then it's not a valid R package.
>
> This is a non-critical topic but if it can be done it would be useful.
>
> Thanks,
>
> Henrik
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Possible Bug: file.exists() Function. Due to UTF-8 Encoding differences on Windows between R 4.0.1 and R 3.6.3?

2020-06-23 Thread Yihui Xie
Hi Tomas,

Sorry for the false alarm! I did some further testing, and you were
right. There was no regression. I suspected it was a regression
because the user who reported the issue said his code worked in R 3.6
but not 4.0. I should have tested it more carefully by myself. After I
tested it again with the German locale and Chinese locale,
respectively, I found that the code worked for both versions of R in
the German locale, and failed in the Chinese locale. Your explanation
makes perfect sense to me. I have also read your blog post when it
came out last month, and I'm really looking forward to the end of this
character encoding pain! Thank you very much for the hard work!

Regards,
Yihui
--
https://yihui.org

On Mon, Jun 22, 2020 at 3:37 AM Tomas Kalibera  wrote:
>
> Hi Yihui,
>
> list.files() returns file names converted to native encoding by Windows,
> so one needs to use only characters representable in current native
> encoding for file names. If one wants to be safe, it makes sense to be
> much stricter than that (only ASCII, and only a subset of it, there is a
> number of recommendations that can be found online). Using more than
> that is asking for trouble.
>
> Unicode "\u00e4" is a Latin-1 character, so representable in CP1252. On
> my Windows running in CP1252 as C locale and system code page, your
> example works fine, file.exists() returns TRUE, and this is the expected
> behavior (tested in R-devel and R4.0).
>
> Your example was run in CP1252 as C locale but CP936 as the system code
> page (see the sessionInfo() output). On Windows, unfortunately, there
> are two different "current locales" at a time. With your settings
> (CP1252 as C locale and CP936 as system code page), I get the same
> results as you, file.exists() returns FALSE. enc2native(z) works fine
> and returns a valid Latin-1 string, but that is because here "native" is
> CP1252. Windows API functions and consequently some C library functions
> that return strings from the OS, however, convert to the encoding from
> the system code page, which is CP936 and it cannot represent "ä". So,
> currently the behavior you are reporting is expected for R 4.0 and
> earlier. I don't think this is a regression, it couldn't have worked
> before, either - and I've tested in 3.6.3 and 3.4.3 on my system.
>
> These problems will go away when UTF-8 is both the current native
> encoding for the C locale and the system code page. This is possible in
> recent Windows 10, but requires UCRT and hence a new toolchain to build
> R, and requires all packages and libraries to be rebuilt from source.
> More details on my blog, also there is experimental build of R
> (installer) and experimental toolchain available:
> https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html
>
> Best
> Tomas
>
>
> On 6/22/20 6:11 AM, Yihui Xie wrote:
> > Hi Tomas,
> >
> > I received a report about R 4.0.0 in the knitr package
> > (https://github.com/yihui/knitr/issues/1840), and I think it is
> > related to the issue here. I created a minimal reproducible example
> > below:
> >
> > owd = setwd(tempdir())
> > z = 'K\u00e4sch.txt'
> > file.create(z)
> > list.files()
> > file.exists(list.files())
> > setwd(owd)
> >
> > Output:
> >
> >> owd = setwd(tempdir())
> >> z = 'K\u00e4sch.txt'
> >> file.create(z)
> > [1] TRUE
> >> list.files()
> > [1] "K?sch.txt"
> >> file.exists(list.files())
> > [1] FALSE
> >> setwd(owd)
> > I wonder if it is expected that file.exists() returns FALSE here.
> >
> >> sessionInfo()
> > R version 4.0.1 (2020-06-06)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > Running under: Windows 7 x64 (build 7601) Service Pack 1
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
> > States.1252
> > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> > system code page: 936
> >
> > FWIW, I also tested Chinese characters in the variable `z` above, and
> > file.exists() returns TRUE only after I Sys.setlocale(, "Chinese").
> >
> > Regards,
> > Yihui
> >
> > On Thu, Jun 11, 2020 at 3:11 AM Tomas Kalibera  
> > wrote:
> >>
> >> Dear Juan,
> >>
> >> I don't see what is the problem from your report. Please try to create a
> >> minimal but complete reproducible example that does not use the renv
> >> package. Perhaps you could use the R debugger (e.g. via
> >> options(error=recover)) to find out what is the argument that
> >> file.exists() has been called with. And then you could try just to call
> >> file.exists() directly with that argument to trigger the problem.
> >>
> >> It may be that the argument has been corrupted/is invalid in the current
> >> native encoding. If that is the case, the next step would be to find out
> >> who corrupted it (renv, R, something else). The error is displayed when
> >> a path name cannot be converted from the current native encoding to
> >> UTF16-LE.
> >>
> >> The experimental support for UTF-8 as native encoding on Wi