--- Karl Ove Hufthammer <[EMAIL PROTECTED]> wrote: > Martin Sevior <[EMAIL PROTECTED]> > wrote in > news:[EMAIL PROTECTED]. > edu.au: > > > If we use UTF-32 as our internal format in the > > piecetable will we still need two or more 32-bit > numbers to > > represent combining characters? > > Yes. > > > I really don't like the idea of variable length > strings per > > glyph > > Also note that a string can contain more, less *or* > the same number > of glyphs as the number of characters.
This is why we need to represent the text with some- thing more like a linked-list of objects where the top-level object represents an "on-screen character" which can be made up of one or more "codepoints" which in turn can be made up of one or more bytes. It may actually be more complicated than this - I'm not sure at this point. We might want to look at how IBM's ICU represents "strings" http://oss.software.ibm.com/icu/ and even how Pango represents its strings internally since it already must handle these types of problems. Andrew Dunbar. > -- > Karl Ove Hufthammer ===== http://linguaphile.sourceforge.net http://www.abisource.com __________________________________________________ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com
