> On 23 Nov 2014, at 01:05 , Henrik Bengtsson <h...@biostat.ucsf.edu> wrote: > > On Sat, Nov 22, 2014 at 12:42 PM, Duncan Murdoch > <murdoch.dun...@gmail.com> wrote: >> On 22/11/2014, 2:59 PM, Stuart Ambler wrote: >>> A colleague¹s R program behaved differently when I ran it, and we thought >>> we traced it probably to different results from string comparisons as >>> below, with different R versions. However the platforms also differed. A >>> friend ran it on a few machines and found that the comparison behavior >>> didn¹t correlate with R version, but rather with platform. >>> >>> I wonder if you¹ve seen this. If it¹s not some setting I¹m unaware of, >>> maybe someone should look into it. Sorry I haven¹t taken the time to read >>> the source code myself. >> >> Looks like a collation order issue. See ?Comparison. > > With the oddity that both platforms use what look like similar locales: > > LC_COLLATE=en_US.UTF-8 > LC_COLLATE=en_US.utf8
It's the sort of thing thay I've tried to wrap my mind around multiple times and failed, but have a look at http://stackoverflow.com/questions/19967555/postgres-collation-differences-osx-v-ubuntu which seems to be essentially the same issue, just for Postgres. If you have the stamina, also look into the python question that it links to. As I understand it, there are two potential reasons: Either the two platforms are not using the same collation table for en_US, or at least one of them is not fully implementing the Unicode Collation Algorithm. In general, collation is a minefield: Some languages have the same letters in different order (e.g. Estonian with Z between S and T); accented characters sort with the unaccented counterpart in some languages but as separate characters in others; some locales sort ABab, others AaBb, yet others aAbB; sometimes punctuation is ignored, sometimes not; sometimes multiple characters count as one, etc. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel