I am not sure it would be a good idea because a glyph can be still composed of more than one code points which is language dependent. Some time you want characters, sometimes code points and sometimes glyphs etc.. Would it be not better to use a simple container and then functions on top which use a view, so we could use them with any container. So we would avoid any allocations for transforming characters from one to the other container. But anyway I think there are many usages for strings that one class to tackle all this problems is not enough.
________________________________ From: Development <development-boun...@qt-project.org> on behalf of Edward Welbourne <edward.welbou...@qt.io> Sent: Wednesday, January 23, 2019 2:53:00 PM To: Arnaud Clère; Thiago Macieira Cc: development@qt-project.org Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString All of this discussion ignores a major elephant: QString's indexing is by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode for a couple of decades now. We *should* have a string type (I don't care what you call it) that acts on strings indexed by Unicode characters, not in terms of a representation. Whether that string type internally uses UTF-16 or UTF-8 should be invisible to its user. Ideally it would be capable of carrying its data internally in either form (so as to avoid needless conversion when both producer and consumer use the same form) and of converting between the two (e.g. so as to append efficiently) as needed. Meanwhile, buffers of data (whether 8-bit, 16-bit or of other sizes) are types we do need in diverse places - but they should be described differently from the sting type (call it a "text" type, if hysterical reasons oblige us to use "string" for its encoding). They can be interpreted as strings, hence can serve as backing-store for a string, provided they respect the relevant rules of a relevant encoding. If blob[index] always returns a Unicode *character*, then blob is a string; if it can sometimes return one half of a UTF-16 surrogate pair (as is the case with QString today) or one byte of a multi-byte UTF-8 chunk, then blob is not really a string, it's just the storage for an encoding of a string. What are our chances of getting this right in Qt 6 ? It's the 21st century - way past time we did this, Eddy. _______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development
_______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development