At 2018-12-18T04:42:36+1100, John Gardner wrote: > > The biggest problem I know of is that the uppercasing transform of > > German sharp S "ß" goes to "SS" > > Pretty damn sure that's nothing compared to the Turkish dotless I > <https://en.wikipedia.org/wiki/Dotted_and_dotless_I#In_computing>. > > Then again, I'm sure they're used to seeing computers screw up the tittle > by now... :-)
I'm aware of it. :) But I still regard it as a lesser problem because at least it doesn't change the length of the string in glyphs or codepoints. ( Bytes? In UTF-8, yup, it sure would: U+0069 LATIN SMALL LETTER I UTF-8: 69 UTF-16BE: 0069 Decimal: i Octal: \0151 i (I) Uppercase: 0049 [EXCEPT IN TURKISH -- GBR] Category: Ll (Letter, Lowercase) Unicode block: 0000..007F; Basic Latin Bidi: L (Left-to-Right) U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE UTF-8: c4 b0 UTF-16BE: 0130 Decimal: İ Octal: \0460 İ (i) Lowercase: 0069 Category: Lu (Letter, Uppercase) Unicode block: 0100..017F; Latin Extended-A Bidi: L (Left-to-Right) Decomposition: 0049 0307 ) A lot of knowledge is embedded in tolower() and toupper() these days. Back in the '70s and '80s they were just syntactic sugar for adding and subtracting 32. Life is more interesting now. Regards, Branden
signature.asc
Description: PGP signature