https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419
--- Comment #10 from Dimitrij Mijoski <dmjpp at hotmail dot com> ---
I was wrong in comment #9. The bug and the proposed fix are ok in comment #7.
While writing some tests for error I discovered yet another bug in UTF-8
decoding. See the example:
// 2 code points, both are 4 byte in UTF-8.
const char u8in[] = u8"\U0010FFFF\U0010AAAA";
const char32_t u32in[] = U"\U0010FFFF\U0010AAAA";
void
utf8_to_utf32_in_error_7 (const codecvt<char32_t, char, mbstate_t> &cvt)
{
char in[7] = {};
char32_t out[3] = {};
char_traits<char>::copy (in, u8in, 7);
in[5] = 'z';
// Last CP has two errors. Its second code unit is malformed and it
// misses its last code unit. Because it misses its last CU, the
// decoder return too early that it is incomplete.
// It should return invalid.
auto state = mbstate_t{};
auto in_next = (const char *) nullptr;
auto out_next = (char32_t *) nullptr;
auto res = codecvt_base::result ();
res = cvt.in (state, in, in + 7, in_next, out, out + 3, out_next);
VERIFY (res == cvt.error); //incorrectly returns partial
VERIFY (in_next == in + 4);
VERIFY (out_next == out + 1);
VERIFY (out[0] == u32in[0] && out[1] == 0 && out[2] == 0);
}
I published the full testsuite on Github, licensed under GPL v3+ of course.
https://github.com/dimztimz/codecvt_test/blob/master/codecvt.cpp . I was
thinking of sending a patch, but after this last bug, 4th, I see this needs
more time. Maybe a testsuite from another library like ICU can be incorporated?
Well, whatever, I will pause my work on this.