Hi! wide_str_to_charconst function relies on the string passed to it having at least two wide characters, the one we are looking for and the terminating NUL. The empty wide character literal like L'' or u'' or U'' is handled earlier and will not reach this function, but unfortunately for const char16_t p = u'\U00110003'; while we do emit an error wide_str_to_charconst is called with a string that contains just the NUL terminator and nothing else. That is because U110003 is too large and can't be represented even as a surrogate pair in char16_t, but the handling of it doesn't give up on the whole string, because other wide characters could be fine. Say u'a\U00110003' would be passed to wide_str_to_charconst after diagnosing an error because the too large char would be thrown away and we'd end up with u'a'. The following patch fixes it by just checking for this condition and punting. I think it is undesirable to print further error.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2019-12-13 Jakub Jelinek <ja...@redhat.com> PR preprocessor/92919 * charset.c (wide_str_to_charconst): If str contains just the NUL terminator, punt quietly. --- libcpp/charset.c.jj 2019-12-10 00:56:07.552291870 +0100 +++ libcpp/charset.c 2019-12-13 12:23:59.096150225 +0100 @@ -1970,6 +1970,17 @@ wide_str_to_charconst (cpp_reader *pfile size_t off, i; cppchar_t result = 0, c; + if (str.len <= nbwc) + { + /* Error recovery, if no errors have been diagnosed previously, + there should be at least two wide characters. Empty literals + are diagnosed earlier and we can get just the zero terminator + only if there were errors diagnosed during conversion. */ + *pchars_seen = 0; + *unsignedp = 0; + return 0; + } + /* This is finicky because the string is in the target's byte order, which may not be our byte order. Only the last character, ignoring the NUL terminator, is relevant. */ Jakub