On Wednesday 14 October 2015 21:51:23 Bubke Marco wrote: > On October 14, 2015 23:10:26 Thiago Macieira <thiago.macie...@intel.com> wrote: > > Do it on your own. You just said that ICU has the function you want, so > > use > > it. > > So Qt is always shipping with ICU?
It can be disabled on Windows. On OS X there's no point since it's part of the system. On Linux, if you disable it, you're going to have some other features reduced, so don't disable it. > > Qt does not have to provide a comparator that operates on something other > > than its native string type. > > Isn't Qt a framework to help developers? Sorry your argumentation is sounds > not very empirical. Yes, it is. But Qt's goal is not to support every single use-case and corner- case out there. Qt should make 90% easy and 9% possible. That means there's a 1% of the realm of possibilities that Qt does not address. If your use-case calls into this group, use the fact that Qt is native code and just call other libraries. That's one of the two main advantages of native code. There's no sandbox to escape from. Qt already supports doing locale-aware comparison. We even have a class for it, so it can be done efficiently: QCollator and it supports our native string type (QString). Providing extra support for a character encoding that is not what QString uses falls in that 1%. Just use ICU. > >> Maybe windows and mac os will bring support to the standard library so we > >> don't need it but in the mean time it would be very helpful. > >> > >> A utf 8 based QTextDocument would be maybe nice too. > > > > What for? It needs to keep a lot of extra structures, so the cost of > > conversion and extra memory is minimal. And besides, QTextDocument really > > needs a seekable string, not UTF-8. > > Is UTF 16 seekable? You still have surrogates and you can merge merge code > points. Seekable enough. It's much easier to deal with than UTF-8. A surrogate pair, as its name says, appears *only* in pairs, so you always know if you're on the first or on the second. Moreover, all living languages are encoded in the Basic Multilingual Plane, so no surrogate pairs are required for any of them. Handling of surrogate pairs can be moved to non-critical codepaths. As for combining code points, that's something different and usually one or more layers removed from the seeking, along-side zero- and full-width code points. QTextDocument also handles fonts with variable width glyphs, so you can never simply convert a byte index to pixel just like that. (not to mention those pesky line breaks...) > Lets describe an example. I send the QTextDocument content to an library > which expect utf8 content and gives me back positions. This gets > interesting if you use non ASCII signs. Actually the new clang code model > works that way. That example shows how UTF-16 is better. See above on seekability of UTF-16 vs UTF-8. The solution for this is to fix the library to accept UTF-16. When we were doing Qt 5.0, we needed PCRE to support UTF-16. Their developers were very welcoming and wrote the version that supports UTF-16, so Qt does not need to reallocate. > > Even if we provide UTF-8 support classes, those will not propagate to the > > GUI. Forget it. > > What about compressing UTF 16 like python is doing it for UTF 32. If you are > only using ascii you set a flag and you can remove all that useless zeros. > It would be have implications for data() but maybe we should not provide > access to the internal representation. If you use UTF 32 as a base you > don't need anymore surrogates. That's what Lars called a "hybrid solution" and vetoed. I second that. Way too much code would break if we did that because we allow people access to the data pointer in QString and to iterate directly (std::{,w,u16}string don't allow that, which makes parsing them actually a lot more cumbersome). As for UTF-32/UCS-4, it occupies twice as much space as it needs for all text written with living languages. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development