On Thu, Oct 7, 2021 at 9:01 AM Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the > libcpp/lex.c > static const cppchar_t utf8_signifier = 0xC0; > ... > if (*buffer->cur >= utf8_signifier) > { > if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + > !first, > state, &s)) > return true; > } > work? Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of > multi-byte character, it is more complicated and seems _cpp_valid_utf8 > assumes UTF-8 as the host charset.
FWIW, here I was following Joseph's guidance from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224#c21 ("You can ignore anything claiming to handle UTF-EBCDIC.") -Lewis