All of this discussion ignores a major elephant: QString's indexing is by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode for a couple of decades now.
We *should* have a string type (I don't care what you call it) that acts on strings indexed by Unicode characters, not in terms of a representation. Whether that string type internally uses UTF-16 or UTF-8 should be invisible to its user. Ideally it would be capable of carrying its data internally in either form (so as to avoid needless conversion when both producer and consumer use the same form) and of converting between the two (e.g. so as to append efficiently) as needed. Meanwhile, buffers of data (whether 8-bit, 16-bit or of other sizes) are types we do need in diverse places - but they should be described differently from the sting type (call it a "text" type, if hysterical reasons oblige us to use "string" for its encoding). They can be interpreted as strings, hence can serve as backing-store for a string, provided they respect the relevant rules of a relevant encoding. If blob[index] always returns a Unicode *character*, then blob is a string; if it can sometimes return one half of a UTF-16 surrogate pair (as is the case with QString today) or one byte of a multi-byte UTF-8 chunk, then blob is not really a string, it's just the storage for an encoding of a string. What are our chances of getting this right in Qt 6 ? It's the 21st century - way past time we did this, Eddy. _______________________________________________ Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development