> > > > UTF-8 is great for communicating between the > > piecetable and the widgets. I > > think we should definately do this. What I don't > > want is for us to store > > our text as UTF-8 in the piecetable. We have a *LOT* > > of code that expects > > that every position in the piecetable corresponds to > > an extra letter of text. > > How is this going to work for languages that need > combining characters? Isn't it going to need to be > changed anyway? Isn't now the time to do this > re-design?
I don't understand this. Doesn't every glyph have a unique unicode code point? If so we still have a one-to one mapping of glyph to text location. > > > What I think we should do is store our unicode as > > UT_uint32 in the > > piecetable which can then be randomly accessed the > > same way we do things now. > > To randomly access what the user sees as a character > or to randomly acces what is internally one codepoint? OK I don't understand. Are you saying that two code points in a row map to a different glph? If so why not just insert the code point for this glyph? > These are not the same. But I don't know the > piecetable either so maybe it is the right thing to > do. > As long as we are thinking about it. Certainly the structure of the code makes lots of assumptions of one PT_DocPosition, one glyph. If unicode was at all sane this should not be a problem. Are you telling me that unicode is not sane and that certain glyphs can only be generated if two 32 bit numbers are presented consecutively? Cheers Martin
