Il 11/06/24 11:36, David C. Partridge ha scritto: >>> Anyone iterating bytewise over a char[] in UTF-8 has also got >>> serious bugs given that a UTF-8 "graphic character" can be up to 8 >>> bytes (national flags comprise two UTF-8 code points).
Giuseppe D'Angelo (11 June 2024 20:09) replied >> There's no such thing as a UTF-8 "graphic character". Grapheme >> sequences are treated at a higher level anyhow in Qt, and we have >> APIs for that (QTextBoundaryFinder, etc.). >> >> And it's not 2. 🏴 is 7 code points. David C. Partridge (12 June 2024 10:30) replied: > Nope just TWO code points e.g. U+1F1FA: REGIONAL INDICATOR SYMBOL > LETTER U) followed by 🇸 (U+1F1F8: REGIONAL INDICATOR SYMBOL LETTER S) > for the US flag, Some confusion here. That's two Unicode code points, each of whch takes several bytes to encode in UTF-8 (and up to two char16_t to encode in UTF-16, as QString does). I'll trust Peppe's count is thus of bytes in UTF-8. Eddy. -- Development mailing list Development@qt-project.org https://lists.qt-project.org/listinfo/development