https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125499

            Bug ID: 125499
           Summary: _M_extract_int returns v=0 instead of
                    numeric_limits::max() for overflowing input under a
                    grouping locale (violates LWG 23)
           Product: gcc
           Version: 15.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liweifriends at gmail dot com
  Target Milestone: ---

Description

  LWG defect 23 ("Num_get overflow result") requires num_get to store
  numeric_limits<T>::max() (or ::min() for a signed negative overflow)
  into the destination value and set failbit whenever the input field
  overflows the target type. libstdc++ honors this when the input
  contains no thousands separators, but violates it when the input is a
  structurally valid grouped numeric literal under a locale that uses
  grouping: it stores 0 instead.


  Source location
  ---------------

  libstdc++-v3/include/bits/locale_facets.tcc, _M_extract_int, the
  post-loop resolution block (the one carrying the
  "// _GLIBCXX_RESOLVE_LIB_DEFECTS  23. Num_get overflow result."
  comment):

    if ((!__sep_pos && !__found_zero && !__found_grouping.size())
        || __testfail)
      {
        __v = 0;
        __err = ios_base::failbit;
      }
    else if (__testoverflow)
      {
        __v = (__negative && __num_traits::__is_signed)
                ? __num_traits::__min : __num_traits::__max;
        __err = ios_base::failbit;
      }
    else
      __v = __negative ? -__result : __result;

  __testfail is checked before __testoverflow. In the overflow-with-
  grouping case both fire simultaneously and __testfail wins, so v=0 is
  returned even though the standard mandates max/min.


  Mechanism
  ---------

  The digit-accumulation loop short-circuits on overflow:

    if (__result > __smax) __testoverflow = true;
    else { ... ; ++__sep_pos; }

  Once __testoverflow fires, ++__sep_pos is skipped for every subsequent
  digit, so __sep_pos freezes at whatever value it held at the moment
  overflow tripped. A later, structurally well-formed thousands_sep then
  either pushes the wrong group size onto __found_grouping, or — more
  commonly — is reset to 0 by one separator and then sees __sep_pos == 0
  on the next separator, which sets __testfail.

  In other words, __testfail in this overlap is a downstream consequence
  of __testoverflow, not an independent structural error in the input.
  The current branch order in the post-loop block lets that derived
  failure mask the overflow signal that LWG 23 explicitly requires.


  Reproducer
  ----------

  Self-contained (uses a custom numpunct so no en_US dependency).

    // g++ -std=c++20 -O0 repro.cpp -o repro && ./repro

    #include <cstdint>
    #include <iomanip>
    #include <iostream>
    #include <limits>
    #include <locale>
    #include <sstream>
    #include <string>

    struct grouping_3 : std::numpunct<char>
    {
    protected:
        char        do_thousands_sep() const override { return ','; }
        std::string do_grouping()      const override { return "\3"; }
    };

    template <class T>
    void run(const char* label, const std::string& input, bool with_grouping)
    {
        std::istringstream iss(input);
        if (with_grouping)
            iss.imbue(std::locale(std::locale::classic(), new grouping_3));

        T v = 0;
        iss >> v;

        const T vmax = std::numeric_limits<T>::max();
        const char* verdict =
            !iss.fail() ? "OK   (parsed)"                       :
            v == vmax   ? "FAIL (v = max -- LWG 23 honored)"   :
            v == 0      ? "FAIL (v = 0   -- LWG 23 VIOLATED)"  :
                          "FAIL (v = other)";
        std::cout << "  " << std::left << std::setw(46) << label
                  << " input=" << std::setw(26) << ("\"" + input + "\"")
                  << " v=" << std::setw(12) << v
                  << " | " << verdict << "\n";
    }

    int main()
    {
        std::cout << "uint32_t max = "
                  << std::numeric_limits<std::uint32_t>::max() << "\n\n";

        run<std::uint32_t>("(a) classic locale, no commas",
                           "12345678901234567",       false);
        run<std::uint32_t>("(b) grouping locale, no commas in input",
                           "12345678901234567",       true);
        run<std::uint32_t>("(c) grouping locale, commas in input",
                           "12,345,678,901,234,567",  true);
    }


  Expected output (per LWG 23)
  ----------------------------

    uint32_t max = 4294967295

    (a) classic locale, no commas               input="12345678901234567"     
v=4294967295 | FAIL (v = max -- LWG 23
  honored)
    (b) grouping locale, no commas in input     input="12345678901234567"     
v=4294967295 | FAIL (v = max -- LWG 23
  honored)
    (c) grouping locale, commas in input        input="12,345,678,901,234,567"
v=4294967295 | FAIL (v = max -- LWG 23
  honored)


  Actual output (g++ 15.1.0, libstdc++ 15)
  ----------------------------------------

    uint32_t max = 4294967295

    (a) classic locale, no commas               input="12345678901234567"     
v=4294967295 | FAIL (v = max -- LWG 23
  honored)
    (b) grouping locale, no commas in input     input="12345678901234567"     
v=4294967295 | FAIL (v = max -- LWG 23
  honored)
    (c) grouping locale, commas in input        input="12,345,678,901,234,567"
v=0          | FAIL (v = 0   -- LWG 23
  VIOLATED)   <-- bug

  Same numerical value, same target type, same overflow. The only
  difference between (a)/(b) and (c) is whether the spelling uses
  thousands separators matching the locale's grouping. (c) is
  structurally valid: grouping "\3" allows a leading group of length
  1..3 followed by groups of exactly 3, which "12,345,678,901,234,567"
  satisfies. LWG 23 therefore requires v = numeric_limits<uint32_t>::max(),
  not 0.


  Suggested fix
  -------------

  Swap the order of the two branches so __testoverflow is checked first:

    if (__testoverflow)
      {
        __v = (__negative && __num_traits::__is_signed)
                ? __num_traits::__min : __num_traits::__max;
        __err = ios_base::failbit;
      }
    else if ((!__sep_pos && !__found_zero && !__found_grouping.size())
             || __testfail)
      {
        __v = 0;
        __err = ios_base::failbit;
      }
    else
      __v = __negative ? -__result : __result;

  Whenever __testoverflow fires the overflow signal wins, matching LWG 23.
  The structural-failure path still handles genuine structural errors
  that do not overlap with overflow.

  The only behavioral change for any other input class is the rare
  "both structurally invalid AND numerically overflowing" input (e.g.
  ",,99999999999"): with the swap that input yields v = max instead of
  v = 0. Both are conforming under LWG 23 (which does not directly
  arbitrate the overlap), failbit is set either way, and the input was
  malformed regardless.


  Tested with
  -----------

    g++ (Homebrew GCC 15.1.0) 15.1.0  on  x86_64 / arm64 darwin

  The relevant code in locale_facets.tcc has not changed materially in
  many years; I expect the bug to reproduce on every libstdc++ version
  that carries the LWG-23-resolution comment.
  • [Bug libstdc++/125499] New: _M_... liweifriends at gmail dot com via Gcc-bugs

Reply via email to