https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71500

Michael Duggan <mwd at md5i dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mwd at md5i dot com

--- Comment #1 from Michael Duggan <mwd at md5i dot com> ---
I can confirm this issue exists in debian's libstdc++-6-dev (6.1.1-5) package.

I've done some tracing, and here is what I have been able to determine:

(All of the below refers to functions in bits/regex_compiler.tcc.)

When std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, true,
false>::_M_apply is called, _M_char_set contains just {'a'}, and _M_range_set
contains {{first='A', second='F'}}.  

When looking up a character in the _M_char_set, the character is lowercased
(because __icase is true) before looking it up in the set.  This is how 'A' and
'a' succeed.

When looking up 'F', the character is not found in the _M_char_set, so the
_M_range_set is checked.  I don't know what the purpose of
_M_translator._M_transform(__ch) is, but since __collate is false, it does
nothing, leaving the character (__s) as 'F'.  It then checks that 'F' is
between 'A' and 'F', which is true.  Success.

When looking up 'f', the character is not found in the _M_char_set, so the
_M_range_set is checked.  'f' is not found to be between 'A' and 'F', so the
match fails.

When the regex is case insensitive, I believe the following has to happen. 
Since it is mostly futile to lower-case a range ([T-f], for example), I think a
candidate char should probably be lower-cased and checked against a range set,
and if that fails, upper-cased and checked against a range set.

That said, any solution that works would be good.

(Note: The calls to _M_apply in the test case will happen when building the
_BracketMatcher's _M_cache, not when the actual regex_match happens.)

Reply via email to