Jan Arne Petersen wrote:

I completely agree that editing UTF-8 text as UTF-8 is fine.

I am just wondering if we should have offsets in "Unicode code points"
(with the addition that for invalid byte sequences each byte counts as
one code point) or offsets in bytes.

The reason for the offset in bytes is that it is unambiguous about what position it means. Though I think erros should count as one code point, this avoids the need to define it at all, because the client does not have to agree with the input method about how to count them.

And when we use offsets in byte how should the toolkit and input method
handle offsets in bytes which do not match code points.

For example we have a surrounding text of "€–" (cursor is at offset 0):
0xe2 0x82 0xac 0xe2 0x80 0x93

What should the toolkit do with such requests like the following?
* delete_surrounding_text(index: 1, length: 3)

I would delete the bytes indicated and show the resulting string, with error boxes for the now-bad bytes.

* cursor_position(index: 2)

I would place the cursor at the position of the glyph produced by that byte, which could include some bytes on either side of it. Note that for combining characters this is a problem that needs to be solved even for valid UTF-8 (ie what does it mean if you point between the letter and the accent?).

or
* preedit_style(index: 2, length: 3, style: underline) with above text
as preedit.

I would remember the positions of styling as bytes. However the renderer can render as though they are moved left to the first break between glyphs (ie it will preedit-highlight the character the first byte is in, and if the preedit region ends in the middle of a glyph then that glyph will not be preedited). Again this problem needs to be solved for combining characters anyway so this is not any more difficult.

In all cases the client can potentially detect that the input method is screwing up, and perhaps report this as a warning message.
_______________________________________________
wayland-devel mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/wayland-devel

Reply via email to