On Sat, Apr 30, 2005 at 06:07:16PM +0800, Steve Underwood wrote: > Michael Giagnocavo wrote: > > >>Michael Giagnocavo wrote: > >> > >> > >>>Hmm, you're right. That's doesn't look bad at all. > >>> > >>>But... what about for comparisons and other Unicode operations? Do the > >>>libraries available support some UTF-8 version of strcmp, strchr, > >>>strcasecmp, etc.? > >>> > >>> > >>> > >>Some of them are easy (strcmp, for example). Most of them are harder, > >>because they either need to know character boundaries, or need case > >>mappings (strcasecmp, for example). Any function that searches for a > >>'char' in a string also won't work if the character being searched for > >>is a multi-byte one. > >> > >> > > > >Not even strcmp works, because you have things like combinations where you > >can represent in Unicode a character using different code points, but it's > >still considered the same. Say, a Latin o with an accent mark. Using wide > >char internally solves these issues, and is most likely faster, depending > >on > >the data. > > > > > Too right. Look at IBM's internationalisation classes for Unicode. It > takes megabytes of code to compare two strings.
Do you just wan't to tell if they're equal, or to sort them? Telling if they're eaul is basically simple: just compare the raw bytes. One small twist: it may be required to use canonical unicode strings (I hope I use the right term here). so you first convert them to a canonical form and then compare them. Or simpler: mandrate all strings to be in canonical form. Sorting is more complicated issue if you don't like the literal order. -- Tzafrir Cohen icq#16849755 +972-50-7952406 [EMAIL PROTECTED] http://www.xorcom.com _______________________________________________ Asterisk-Dev mailing list [email protected] http://lists.digium.com/mailman/listinfo/asterisk-dev To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
