https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103446
--- Comment #7 from Zloten <zloten at mail dot ru> --- It's very very strange. I've tested GCC 11.2.0(x86-64) and MinGW(x86-64). Both have the same problem. Let's do not use L suffix (it's implementation-defined). Let's use u suffix. For both: int test() { return *(int*)u"𠂡"; // returns valid value 0xDCA1D840 } int test2() { return u'𠂡'; // throws warning "character constant too long for its type" // and returns invalid value 0xDCA1 } Why we have no possibility to use non-BMP UTF-16 encoded value? It can be simply treated by compiler like a 32-bit integer constant. Sorry for my bad English.