> By all means, let's make sure the internals are efficient for the more
> common languages and scripts; but it's way past time to start doing
> Unicode properly, so that all cultures are well-served by default, when
> the software folk are using is built on Qt,

I don't think anyone knows what "properly" is. But the more I think about it, 
the more I like the idea I expressed as a list of sequences of various 
character sizes. I think it is a good balance between space and efficiency. To 
recap that:
A class that stores a list of list of same-width characters. For the most naive 
case the list is 1 list long and contains only 8bit characters. This performs 
identically to QByteArray. Non-ASCII languages requiring 16-bit storage are as 
QStrings are now. Then, in the more complicated scenarios, it breaks out 8-bit 
segments and 16-bit segments and makes them appear contiguous. (Emoji in ASCII 
text). Of course there could be functions to collapse it all to the uniform 
largest used width (maximize()) or break it apart to minimize() space (for very 
long 8-bit strings with occasional characters), and there can even be a 
bestFit() heuristic. And as always you can get it serialized as UTF-8 or 16... 
All the above also extends to 32-bit as well. I think this blends handles the 
average case very well (all characters of same width) and has reasonable cost 
for occasional exotic characters. 
_______________________________________________
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development

Reply via email to