https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125499
Bug ID: 125499
Summary: _M_extract_int returns v=0 instead of
numeric_limits::max() for overflowing input under a
grouping locale (violates LWG 23)
Product: gcc
Version: 15.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: liweifriends at gmail dot com
Target Milestone: ---
Description
LWG defect 23 ("Num_get overflow result") requires num_get to store
numeric_limits<T>::max() (or ::min() for a signed negative overflow)
into the destination value and set failbit whenever the input field
overflows the target type. libstdc++ honors this when the input
contains no thousands separators, but violates it when the input is a
structurally valid grouped numeric literal under a locale that uses
grouping: it stores 0 instead.
Source location
---------------
libstdc++-v3/include/bits/locale_facets.tcc, _M_extract_int, the
post-loop resolution block (the one carrying the
"// _GLIBCXX_RESOLVE_LIB_DEFECTS 23. Num_get overflow result."
comment):
if ((!__sep_pos && !__found_zero && !__found_grouping.size())
|| __testfail)
{
__v = 0;
__err = ios_base::failbit;
}
else if (__testoverflow)
{
__v = (__negative && __num_traits::__is_signed)
? __num_traits::__min : __num_traits::__max;
__err = ios_base::failbit;
}
else
__v = __negative ? -__result : __result;
__testfail is checked before __testoverflow. In the overflow-with-
grouping case both fire simultaneously and __testfail wins, so v=0 is
returned even though the standard mandates max/min.
Mechanism
---------
The digit-accumulation loop short-circuits on overflow:
if (__result > __smax) __testoverflow = true;
else { ... ; ++__sep_pos; }
Once __testoverflow fires, ++__sep_pos is skipped for every subsequent
digit, so __sep_pos freezes at whatever value it held at the moment
overflow tripped. A later, structurally well-formed thousands_sep then
either pushes the wrong group size onto __found_grouping, or — more
commonly — is reset to 0 by one separator and then sees __sep_pos == 0
on the next separator, which sets __testfail.
In other words, __testfail in this overlap is a downstream consequence
of __testoverflow, not an independent structural error in the input.
The current branch order in the post-loop block lets that derived
failure mask the overflow signal that LWG 23 explicitly requires.
Reproducer
----------
Self-contained (uses a custom numpunct so no en_US dependency).
// g++ -std=c++20 -O0 repro.cpp -o repro && ./repro
#include <cstdint>
#include <iomanip>
#include <iostream>
#include <limits>
#include <locale>
#include <sstream>
#include <string>
struct grouping_3 : std::numpunct<char>
{
protected:
char do_thousands_sep() const override { return ','; }
std::string do_grouping() const override { return "\3"; }
};
template <class T>
void run(const char* label, const std::string& input, bool with_grouping)
{
std::istringstream iss(input);
if (with_grouping)
iss.imbue(std::locale(std::locale::classic(), new grouping_3));
T v = 0;
iss >> v;
const T vmax = std::numeric_limits<T>::max();
const char* verdict =
!iss.fail() ? "OK (parsed)" :
v == vmax ? "FAIL (v = max -- LWG 23 honored)" :
v == 0 ? "FAIL (v = 0 -- LWG 23 VIOLATED)" :
"FAIL (v = other)";
std::cout << " " << std::left << std::setw(46) << label
<< " input=" << std::setw(26) << ("\"" + input + "\"")
<< " v=" << std::setw(12) << v
<< " | " << verdict << "\n";
}
int main()
{
std::cout << "uint32_t max = "
<< std::numeric_limits<std::uint32_t>::max() << "\n\n";
run<std::uint32_t>("(a) classic locale, no commas",
"12345678901234567", false);
run<std::uint32_t>("(b) grouping locale, no commas in input",
"12345678901234567", true);
run<std::uint32_t>("(c) grouping locale, commas in input",
"12,345,678,901,234,567", true);
}
Expected output (per LWG 23)
----------------------------
uint32_t max = 4294967295
(a) classic locale, no commas input="12345678901234567"
v=4294967295 | FAIL (v = max -- LWG 23
honored)
(b) grouping locale, no commas in input input="12345678901234567"
v=4294967295 | FAIL (v = max -- LWG 23
honored)
(c) grouping locale, commas in input input="12,345,678,901,234,567"
v=4294967295 | FAIL (v = max -- LWG 23
honored)
Actual output (g++ 15.1.0, libstdc++ 15)
----------------------------------------
uint32_t max = 4294967295
(a) classic locale, no commas input="12345678901234567"
v=4294967295 | FAIL (v = max -- LWG 23
honored)
(b) grouping locale, no commas in input input="12345678901234567"
v=4294967295 | FAIL (v = max -- LWG 23
honored)
(c) grouping locale, commas in input input="12,345,678,901,234,567"
v=0 | FAIL (v = 0 -- LWG 23
VIOLATED) <-- bug
Same numerical value, same target type, same overflow. The only
difference between (a)/(b) and (c) is whether the spelling uses
thousands separators matching the locale's grouping. (c) is
structurally valid: grouping "\3" allows a leading group of length
1..3 followed by groups of exactly 3, which "12,345,678,901,234,567"
satisfies. LWG 23 therefore requires v = numeric_limits<uint32_t>::max(),
not 0.
Suggested fix
-------------
Swap the order of the two branches so __testoverflow is checked first:
if (__testoverflow)
{
__v = (__negative && __num_traits::__is_signed)
? __num_traits::__min : __num_traits::__max;
__err = ios_base::failbit;
}
else if ((!__sep_pos && !__found_zero && !__found_grouping.size())
|| __testfail)
{
__v = 0;
__err = ios_base::failbit;
}
else
__v = __negative ? -__result : __result;
Whenever __testoverflow fires the overflow signal wins, matching LWG 23.
The structural-failure path still handles genuine structural errors
that do not overlap with overflow.
The only behavioral change for any other input class is the rare
"both structurally invalid AND numerically overflowing" input (e.g.
",,99999999999"): with the swap that input yields v = max instead of
v = 0. Both are conforming under LWG 23 (which does not directly
arbitrate the overlap), failbit is set either way, and the input was
malformed regardless.
Tested with
-----------
g++ (Homebrew GCC 15.1.0) 15.1.0 on x86_64 / arm64 darwin
The relevant code in locale_facets.tcc has not changed materially in
many years; I expect the bug to reproduce on every libstdc++ version
that carries the LWG-23-resolution comment.