[Bug libstdc++/70893] New: codecvt incorrectly decodes UTF-16 due to optimization

2016-05-01 Thread kirillnow at gmail dot com
Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: kirillnow at gmail dot com Target Milestone: --- In libstdc++ source codecvt.cc: inline bool is_high_surrogate(char32_t c) { return c >= 0xD800 && c <= 0xDBFF; } compiles to: if (is_

[Bug libstdc++/70893] codecvt incorrectly decodes UTF-16

2016-05-01 Thread kirillnow at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70893 --- Comment #1 from Кирилл --- Bad guess on my part, sorry! Actual problem is: 305:else if (is_low_surrogate(c)) 306: return invalid_mb_sequence; Stand-alone low surrogates are not uncommon, and could be decoded as valid utf-8. Example:

[Bug libstdc++/70893] codecvt incorrectly decodes UTF-16

2016-05-01 Thread kirillnow at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70893 --- Comment #2 from Кирилл --- ... Just realized its wrong endianness problem. codecvt_utf8_utf16 should assume utf16be by default, right? Apparently, no.

[Bug libstdc++/70893] codecvt incorrectly decodes UTF-16be

2016-05-03 Thread kirillnow at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70893 --- Comment #5 from Кирилл --- (In reply to Jonathan Wakely from comment #3) > (In reply to Кирилл from comment #2) > > ... > > Just realized its wrong endianness problem. > > codecvt_utf8_utf16 should assume utf16be by default, right? Apparentl

[Bug libstdc++/70893] codecvt incorrectly decodes UTF-16be

2016-05-03 Thread kirillnow at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70893 --- Comment #6 from Кирилл --- (In reply to Jonathan Wakely from comment #4) > If you think there's a bug here please provide a testcase that compiles and > produces an incorrect result. Here: #include #include #include #include using names