Il 14/03/19 22:48, Thiago Macieira ha scritto:
Forchar16_t text1[] = u"" "\u0102"; It produces, without /utf-8 (seehttps://msvc.godbolt.org/z/EvtKzq): ?text1@@3PA_SA DB '?', 00H, 00H, 00H ; text1 And with /utf-8: ?text1@@3PA_SA DB 0c4H, 00H, 01aH, ' ', 00H, 00H ; text1 Those two values make no sense. U+0102 is neither 0x003f (question mark) nor 0x00c4 0x201a ("Ä‚"). This is a clear compiler bug. An interpretation of the C++11 standard could say that the translation is correct for the no-/utf-8 build, but with /utf-8 or /execution-charset:utf-8 it should have produced the correct result.
Actually, those values have a somehow connection with the input. Looks like MSVC is double-encoding it:
* "\u0102" under UTF-8 execution charset produces a string containing 0xC4 0x82;
* that string literal is a generic narrow string literal (non prefixed). When concatenating to a u-prefixed string literal, somehow MSVC thinks it's in its native codepage instead of UTF-8...
* so it now reencodes 0xC4 0x82 from CP1252 to UTF-16, yielding 0x00 0xC4 0x20 0x1a, which is what ends up in text1 (fixing the endianness)The mapping of \u escape sequences to the execution character set happens before string literal concatenation (translation phases 5/6). But AFAIU the mapping is purely symbolic, and has nothing to do with any actual encoding, so MSVC is at fault here?
My 2 c, -- Giuseppe D'Angelo | [email protected] | Senior Software Engineer KDAB (France) S.A.S., a KDAB Group company Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com KDAB - The Qt, C++ and OpenGL Experts
smime.p7s
Description: Firma crittografica S/MIME
_______________________________________________ Development mailing list [email protected] https://lists.qt-project.org/listinfo/development
