FYI: Unicode codepoint != character visual representation. Moreover, a single character could be represented with a sequence of glyps or vice versa - a sequence of characters could be represented with a single glyph. QString (and every other Unicode string class in the world) represents a sequence of Unicode codepoints (in this or that UTF), not characters or glyphs - always remember that!
Regards, Konstantin 2015-02-11 20:49 GMT+04:00 Matthew Woehlke <[email protected]>: > On 2015-02-11 11:29, Thiago Macieira wrote: > > On Wednesday 11 February 2015 11:22:59 Julien Blanc wrote: > >> On 11/02/2015 10:32, Bo Thorsen wrote: > >>> 2) length() returns the number of chars I see on the screen, not a > >>> random implementation detail of the chosen encoding. > >> > >> How’s that supposed to work with combining characters, which are part of > >> unicode ? > > > > That's true. And add that there are some zero-width characters too and > some > > characters that are double-width. > > I'm not going to claim this is the *best* answer, but at least one that > seems logical... length() should be the number of times one must hit > backspace starting from the end of the text to erase the entire text. > IOW, the number of logical glyphs. Double-width characters are one > logical glyph. Combining characters are not independently logical glyphs > (e.g. 'ñ' is one glyph, regardless of how it is encoded). > > Conversely, I'm sure there are times when you need to know the number of > codepoints (e.g. allocating memory to make a copy). Possibly length() > and size() should return different results. (Which is a mess, but...) > > -- > Matthew > > _______________________________________________ > Development mailing list > [email protected] > http://lists.qt-project.org/mailman/listinfo/development >
_______________________________________________ Development mailing list [email protected] http://lists.qt-project.org/mailman/listinfo/development
