This can be narrowed down to

Sys.setlocale("LC_CTYPE","C")
x2 <- "\u00e7"
x1 <- iconv(x2, from="UTF-8", to="latin1")
x1 < x2 # FALSE or NA

In R 4.0 it returns NA, in R-devel it returns FALSE (when running in CP1252 locale on Windows).

It is the same character, only the encoding is different, so the R-devel return value is correct and the previous behavior was a bug. It should not matter what is the current native encoding when doing the comparison. Also, the collation order should only apply after characters are converted to a common encoding, when the encoding is known, so in this case the collation order of the locale should not have an impact, and it seems it doesn't. I don't think R should preserve bug-compatibility in this case, code depending on this buggy behavior should be fixed.

I don't see immediately which NEWS entry this corresponds to. Please keep in mind that NEWS don't cover all changes, for that you need to look at the svn commits, and even then it may be hard to track down concrete changes in behavior to the commits, to do that you need to debug the code or bisect.

Changes to _documented_ behavior should be more visible and of course reflected by changes in the documentation, if not, it is a bug worth reporting,  and the report should come with a reference to concrete parts of the documentation that is violated.

Best
Tomas

On 5/23/20 12:03 PM, Jan Gorecki wrote:
Hi R developers,
There seems to be breaking change in base::order on Windows in
R-devel. Code below yields different results on R 4.0.0 and R-devel
(2020-05-22 r78545). I haven't found any info about that change in
NEWS. Was the change intentional?

Sys.setlocale("LC_CTYPE","C")
Sys.setlocale("LC_COLLATE","C")
x1 = "fa\xE7ile"
Encoding(x1) = "latin1"
x2 = iconv(x1, "latin1", "UTF-8")
base::order(c(x2,x1,x1,x2))
Encoding(x2) = "unknown"
base::order(c(x2,x1,x1,x2))

# R 4.0.0
base::order(c(x2,x1,x1,x2))
#[1] 1 4 2 3
Encoding(x2) = "unknown"
base::order(c(x2,x1,x1,x2))
#[1] 2 3 1 4

# R-devel
base::order(c(x2,x1,x1,x2))
#[1] 1 2 3 4
Encoding(x2) = "unknown"
base::order(c(x2,x1,x1,x2))
#[1] 1 4 2 3

Best Regards,
Jan Gorecki

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to