Ondrej Bilka wrote: > For encodings like BIG5 if character contains / it could quit prematurely. Eric Blake wrote: > BIG5 is a lousy character encoding for the very > reason that it confuses common ASCII bytes with encoded characters, > depending on shift state.
BIG5 does not have shift state. BIG5 is a stateless multibyte encoding, composed of two character sets: first byte second byte 0x00..0x7F (ASCII) 0xA1..0xFE 0x40..0x7E,0xA1..0xFE (BIG5) The '/' is not among the range of allowed byte values for the second byte. Therefore strchr(s,'/') and strrchr(s,'/') work fine also in BIG5 encoded strings. > character encodings dependent on your locale (except on Mac, and look at the > problems that caused) The current problem with filename on MacOS X is that the underlying filesystem, HFS+, stores filenames in decomposed Unicode. I.e. when the user creates a file with a filename with accents (precomposed Unicode, as usual), the file that gets created has a different name, its decomposed Unicode form. This is quite annoying because - the file name that one can retrieve with "ls" is different from the specified file name, - it goes against the Character Model of the W3C [1], which recommends NFC (not NFD) normalization. Bruno [1] http://www.w3.org/TR/charmod-norm/