Hi, > > This gave me an idea: perhaps the easiest way for Qt to fix it would > > be to check `GetACP() == CP_UTF8`, and if it is true then just use > > Qt's built-in UTF-8 support and bypass MultiByteToWideChar completely. > > Indeed. And given that the UTF-8 codec is highly optimised, it will be > definitely much faster. I'll make the change for Qt 6.
I will patch our copy of Qt to do the same. > However, it wouldn't solve the problem for other multibyte locales that may > have more than one continuation character. A quick check of the likely > culprits reveals that: > * Chinese (CP 936) uses GBK, which is limited to two bytes > * Japanese (CP 932) uses a variant of Shift JIS, but is also two-byte only > * Korean (CP 949) uses the Unified Hangul Code, which likewise only goes up > to two bytes > > Wikipedia also says that GB 2312 is the most common encoding for web pages in > Chinese, but that is also a one- or two-byte codec too. And it is no longer > used by Windows itself. > > So it looks like we've never hit this problem because the codepages used by > Windows were all DBCS. It might not be worth fixing the codec implementation > then. Agreed. I have a suspicion that Microsoft in the past has stated explicitly in docs that MBCS uses longer than 2 bytes per character, and bugs of similar nature are why the UTF-8 support is opt-in and why the system-wide support is marked as beta... Cheers, Alvin Wong _______________________________________________ Interest mailing list Interest@qt-project.org https://lists.qt-project.org/listinfo/interest