Re: [Rd] sort yields different results on OS X (PR#14163)

Prof Brian Ripley Tue, 22 Dec 2009 05:38:24 -0800

On Tue, 22 Dec 2009, Peter Dalgaard wrote:

Prof Brian Ripley wrote:
That different OSes use the same name for a locale does not make them thesame locale.
Note that R can be compiled to use ICU, which provides a well-consideredcollation suite. R on Mac OS X uses ICU, as does a Linux build if it isavailable -- so I would say that it is RHEL that is out of line here (itmakes little sense to have < and > far apart in the collation sequence).
That's not it:
v <- c("1","<0","<3","2")
sort(v)
[1] "<0" "1"  "2"  "<3"

The point is rather that "special characters" are ignored during collation.


Sometimes ....

Apparently, this comes from /usr/share/i18n/locales/iso14651_t1_common onFedora; I wouldn't know how faithful to the ISO standard that is.

ISO 14651 is a version of the Unicode Collation Algorithm(http://www.unicode.org/reports/tr10/) which ICU uses. So otherpeople have implemented the same set of rules to give differentresults -- which is quite possible given the number of non-prescribedchoices that need to be made.

We've seen too many anomalies from glibc to trust it: which is why ICUis used if available.


--
Brian D. Ripley,                  [email protected]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] sort yields different results on OS X (PR#14163)

Reply via email to