https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79511
Bug ID: 79511
Summary: Convertation issues in std::codecvt_utf8_utf16
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: mikhail at pilin dot name
Target Milestone: ---
Hi,
I found two issues in std::codecvt_utf8_utf16 for GCC v5.4 (Clang v3.8 has no
issues here):
1. \xEF\xBF\xBF (UTF8) convered to \xD7FF\xDFFF (UTF16), but should be \xFFFF
(UTF16). (Attention: \xD7FF is not is high surrogate region [D800..DBFF]).
Please see non character representation table in
http://www.unicode.org/faq/private_use.html#noncharacters
2. std::codecvt_utf8_utf16 requires std::little_endian on x86_64. Otherwise big
endian order is used as default!