On Tue, Apr 16, 2013 at 06:19:47PM -0700, Bill Spitzak wrote: > Jan Arne Petersen wrote: > > >I completely agree that editing UTF-8 text as UTF-8 is fine. > > > >I am just wondering if we should have offsets in "Unicode code points" > >(with the addition that for invalid byte sequences each byte counts as > >one code point) or offsets in bytes. > > The reason for the offset in bytes is that it is unambiguous about > what position it means. Though I think erros should count as one > code point, this avoids the need to define it at all, because the > client does not have to agree with the input method about how to > count them. > > >And when we use offsets in byte how should the toolkit and input method > >handle offsets in bytes which do not match code points. > > > >For example we have a surrounding text of "€–" (cursor is at offset 0): > >0xe2 0x82 0xac 0xe2 0x80 0x93 > > > >What should the toolkit do with such requests like the following? > >* delete_surrounding_text(index: 1, length: 3) > > I would delete the bytes indicated and show the resulting string, > with error boxes for the now-bad bytes. > > >* cursor_position(index: 2) > > I would place the cursor at the position of the glyph produced by > that byte, which could include some bytes on either side of it. Note > that for combining characters this is a problem that needs to be > solved even for valid UTF-8 (ie what does it mean if you point > between the letter and the accent?). > > >or > >* preedit_style(index: 2, length: 3, style: underline) with above text > >as preedit. > > I would remember the positions of styling as bytes. However the > renderer can render as though they are moved left to the first break > between glyphs (ie it will preedit-highlight the character the first > byte is in, and if the preedit region ends in the middle of a glyph > then that glyph will not be preedited). Again this problem needs to > be solved for combining characters anyway so this is not any more > difficult. > > In all cases the client can potentially detect that the input method > is screwing up, and perhaps report this as a warning message.
I think consensus is that we leave the offsets as bytes. I agree with that, considering that: 1) it shouldn't happen, 2) when it does, the toolkit will have deal with it. Kristian _______________________________________________ wayland-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/wayland-devel
