https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66464
Bug ID: 66464 Summary: codecvt_utf16 max_length returning incorrect value Product: gcc Version: 5.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: lcarreon at bigpond dot net.au Target Milestone: --- I just noticed that codecvt_utf16<char32_t>::max_length() is returning 3. This appears to be the wrong value because a surrogate pair is composed of 4 bytes therefore max_length() should at least be returning 4. I'm also wondering whether the BOM should be taken into account. If it so happens that at the beginning of a UTF-16 string which has a BOM and it so happens to start with a surrogate pair, 6 bytes have to be consumed to generate a single UCS-4 character. Should the same thing be considered with codecvt_utf8<char32_t>::max_length() which currently returns 4. Taking into account the BOM and the longest UTF-8 character below 0x10FFFF, shouldn't max_length() return 7. I'm not really sure if the BOM should be taken into account because the standard's definition for do_max_length() simply says the maximum number of input characters that needs to be consumed to generate a single output character.