https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66441
Bug ID: 66441 Summary: wstring_convert not working correctly Product: gcc Version: 5.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: lcarreon at bigpond dot net.au Target Milestone: --- Created attachment 35707 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35707&action=edit Test program which demonstrates the issue In my opinion, there is a problem with the way wstring_convert behaves. I use Fedora 22 32-bit and 64-bit which includes GCC 5.1.1. Attached is a test program which demonstrates the problem. This test program converts a UTF-8 string which does not have a BOM into a UCS-4 string. The UCS-4 string is then converted to a UTF-16LE string. The test program is composed of two parts: 1) performs the conversion using the codecvt facets directly and 2) performs the conversion using wstring_convert. I compiled the test program using the following command: g++ -std=c++14 -o test_convert test_convert.cpp This test program generates the following result: UTF-8 string=Provençal UTF-8 string length=10 Test conversion using codecvt facets directly: UCS-4 string=50 72 6f 76 65 6e e7 61 6c UCS-4 string length=9 UTF-16LE string=ff fe 50 0 72 0 6f 0 76 0 65 0 6e 0 e7 0 61 0 6c 0 UTF-16LE string length=20 Test conversion using wstring_convert: UCS-4 string=50 72 6f 76 65 6e e7 61 6c UCS-4 string length=9 UTF-16LE string=ff fe 50 0 72 0 6f 0 76 0 65 0 ff fe 6e 0 e7 0 ff fe 61 0 6c 0 UTF-16LE string length=24 In my opinion, the result generated by the codecvt facets is the correct result. Notice that the UTF-16LE result generated by wstring_convert contains three occurrences of the BOM which is incorrect. I hope I have given enough information concerning this issue.