> On 23 Nov 2014, at 01:05 , Henrik Bengtsson <h...@biostat.ucsf.edu> wrote:
> 
> On Sat, Nov 22, 2014 at 12:42 PM, Duncan Murdoch
> <murdoch.dun...@gmail.com> wrote:
>> On 22/11/2014, 2:59 PM, Stuart Ambler wrote:
>>> A colleague¹s R program behaved differently when I ran it, and we thought
>>> we traced it probably to different results from string comparisons as
>>> below, with different R versions.  However the platforms also differed.  A
>>> friend ran it on a few machines and found that the comparison behavior
>>> didn¹t correlate with R version, but rather with platform.
>>> 
>>> I wonder if you¹ve seen this.  If it¹s not some setting I¹m unaware of,
>>> maybe someone should look into it.  Sorry I haven¹t taken the time to read
>>> the source code myself.
>> 
>> Looks like a collation order issue.  See ?Comparison.
> 
> With the oddity that both platforms use what look like similar locales:
> 
> LC_COLLATE=en_US.UTF-8
> LC_COLLATE=en_US.utf8

It's the sort of thing thay I've tried to wrap my mind around multiple times 
and failed, but have a look at

http://stackoverflow.com/questions/19967555/postgres-collation-differences-osx-v-ubuntu

which seems to be essentially the same issue, just for Postgres. If you have 
the stamina, also look into the python question that it links to.

As I understand it, there are two potential reasons: Either the two platforms 
are not using the same collation table for en_US, or at least one of them is 
not fully implementing the Unicode Collation Algorithm.

In general, collation is a minefield: Some languages have the same letters in 
different order (e.g. Estonian with Z between S and T); accented characters 
sort with the unaccented counterpart in some languages but as separate 
characters in others; some locales sort ABab, others AaBb, yet others aAbB; 
sometimes punctuation is ignored, sometimes not; sometimes multiple characters 
count as one, etc.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to