Hi [ It seems my ISP has troubles getting my mail to this list, and thus the delay :-( ]
On Sun, May 01, 2005 at 09:14:07PM +0800, Steve Underwood wrote: > >Do you just wan't to tell if they're equal, or to sort them? > > > >Telling if they're eaul is basically simple: just compare the raw bytes. > >One small twist: it may be required to use canonical unicode strings (I > >hope I use the right term here). so you first convert them to a > >canonical form and then compare them. Or simpler: mandrate all strings > >to be in canonical form. > > > >Sorting is more complicated issue if you don't like the literal order. > > > > > To tell if two strings are equal (really equal, rather than just byte > for byte the same) you must bring both copies to canonical form. Very > little Unicode is in canonical form, so this is not a small twist. It is > a big PITA, that must be done in every case. The process of creating > canonical unicode is slow, complex, and takes lots of code. I am unclear > if insisting on canonical form is something you can really do, and if it > is a complete answer. You need to find a linguistics expert who knows > Unicode inside out to get a proper answer. There seems to be very little > software that does proper comparisons at present. I must admit I don't remember the unicode standards very well. But I recall there are actually three levels of canonization. Anyway, if the comparision is simple, and the task of converting to cannonical form can be complicated, why not mandate a certain canonical form? That is, caller id SHOULD be sent in cannonical form. Servers may assume that it is in that form for the sake of comparison. What would such a formalization break? > > Having said this, for most data processing purposes this can be skipped, > and a byte by byte comparison used. If we just define that all text is > UTF-8, the only complexity which is unavoidable is ensuring strings do > not overrun buffers, while also do not stop mid-character. As others > have shown, this is trivial. UTF-8 can be scanned forwards and backwards > in a context free manner. -- Tzafrir Cohen icq#16849755 +972-50-7952406 [EMAIL PROTECTED] http://www.xorcom.com _______________________________________________ Asterisk-Dev mailing list [email protected] http://lists.digium.com/mailman/listinfo/asterisk-dev To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
