On 05/02/2013 09:56 PM, Kristian Høgsberg wrote: > On Tue, Apr 16, 2013 at 06:19:47PM -0700, Bill Spitzak wrote: >> Jan Arne Petersen wrote: >> >>> I completely agree that editing UTF-8 text as UTF-8 is fine. >>> >>> I am just wondering if we should have offsets in "Unicode code points" >>> (with the addition that for invalid byte sequences each byte counts as >>> one code point) or offsets in bytes. >> >> The reason for the offset in bytes is that it is unambiguous about >> what position it means. Though I think erros should count as one >> code point, this avoids the need to define it at all, because the >> client does not have to agree with the input method about how to >> count them. >> >>> And when we use offsets in byte how should the toolkit and input method >>> handle offsets in bytes which do not match code points. >>> >>> For example we have a surrounding text of "€–" (cursor is at offset 0): >>> 0xe2 0x82 0xac 0xe2 0x80 0x93 >>> >>> What should the toolkit do with such requests like the following? >>> * delete_surrounding_text(index: 1, length: 3) >> >> I would delete the bytes indicated and show the resulting string, >> with error boxes for the now-bad bytes. >> >>> * cursor_position(index: 2) >> >> I would place the cursor at the position of the glyph produced by >> that byte, which could include some bytes on either side of it. Note >> that for combining characters this is a problem that needs to be >> solved even for valid UTF-8 (ie what does it mean if you point >> between the letter and the accent?). >> >>> or >>> * preedit_style(index: 2, length: 3, style: underline) with above text >>> as preedit. >> >> I would remember the positions of styling as bytes. However the >> renderer can render as though they are moved left to the first break >> between glyphs (ie it will preedit-highlight the character the first >> byte is in, and if the preedit region ends in the middle of a glyph >> then that glyph will not be preedited). Again this problem needs to >> be solved for combining characters anyway so this is not any more >> difficult. >> >> In all cases the client can potentially detect that the input method >> is screwing up, and perhaps report this as a warning message. > > I think consensus is that we leave the offsets as bytes. I agree with > that, considering that: 1) it shouldn't happen, 2) when it does, the > toolkit will have deal with it.
Yes, that is how it is handled in the new version of this series: http://lists.freedesktop.org/archives/wayland-devel/2013-April/008670.html -- Jan Arne Petersen Openismus GmbH http://www.openismus.com _______________________________________________ wayland-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/wayland-devel
