On terça-feira, 25 de dezembro de 2012 17.25.07, Jan Kundrát wrote: > I think that when testing whether a string can be split at a particular > index, my code shall check whether the next symbol is a "combinig > character". However, I know nothing about various non-latin scripts, and I > wasn't able to tell which method of a QChar shall I use in this context. It > looks like QChar::isHighSurrogate() and QChar::isLowSurrogate() will be > part of the solution, but they apparently only work for non-BMP characters. > > In short, I know nothing about Unicode details, but want to split the string > at offsets where it is "safe". How do I tell where to split?
Are you sure you need to keep the combining characters together in the same RFC 2047 chunk? If you do, you can use QChar::category [1] and check for the category type QChar::Mark_SpacingCombining. If you run into a surrogate type, then get the two surrogates, calculate the UCS4 value (see QChar::surrogateToUcs4 [2]) and try again. My currently-experimental QStringIterator class [3] would return UCS 4 values when iterating over a string. [1] http://qt-project.org/doc/qt-5.0/qtcore/qchar.html#category-2 [2] http://qt-project.org/doc/qt-5.0/qtcore/qchar.html#surrogateToUcs4 [3] https://codereview.qt-project.org/669 -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Interest mailing list Interest@qt-project.org http://lists.qt-project.org/mailman/listinfo/interest