https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66464

            Bug ID: 66464
           Summary: codecvt_utf16 max_length returning incorrect value
           Product: gcc
           Version: 5.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lcarreon at bigpond dot net.au
  Target Milestone: ---

I just noticed that codecvt_utf16<char32_t>::max_length() is returning 3.

This appears to be the wrong value because a surrogate pair is composed of 4
bytes therefore max_length() should at least be returning 4.

I'm also wondering whether the BOM should be taken into account.  If it so
happens that at the beginning of a UTF-16 string which has a BOM and it so
happens to start with a surrogate pair, 6 bytes have to be consumed to generate
a single UCS-4 character.

Should the same thing be considered with codecvt_utf8<char32_t>::max_length()
which currently returns 4.  Taking into account the BOM and the longest UTF-8
character below 0x10FFFF, shouldn't max_length() return 7.

I'm not really sure if the BOM should be taken into account because the
standard's definition for do_max_length() simply says the maximum number of
input characters that needs to be consumed to generate a single output
character.

Reply via email to