https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118105

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <r...@gcc.gnu.org>:

https://gcc.gnu.org/g:fa6549c1f0e75ff33cb641d98af72ee354b04bbe

commit r15-6691-gfa6549c1f0e75ff33cb641d98af72ee354b04bbe
Author: Jonathan Wakely <jwak...@redhat.com>
Date:   Wed Dec 18 12:57:14 2024 +0000

    libstdc++: Handle errors from strxfrm in std::collate::transform [PR85824]

    std::regex builds a cache of equivalence classes by calling
    std::regex_traits<char>::transform_primary(c) for every char, which then
    calls std::collate<char>::transform which calls strxfrm. On several
    targets strxfrm fails for non-ASCII characters. Because strxfrm has no
    return value reserved to indicate an error, some implementations return
    INT_MAX or SIZE_MAX. This causes std::collate::transform to try to
    allocate a huge buffer, which is either very slow or throws
    std::bad_alloc. We should check errno after calling strxfrm to detect
    errors and then throw a more appropriate exception instead of trying to
    allocate a huge buffer.

    Unfortunately the std::collate<C>::_M_transform function has a
    non-throwing exception specifier, so we can't do the error handling
    there.

    As well as checking errno, this patch changes std::collate::do_transform
    to use __builtin_alloca for small inputs, and to use RAII to deallocate
    the buffers used for large inputs.

    This change isn't sufficient to fix the three std::regex bugs caused by
    the lack of error handling in std::collate::do_transform, we also need
    to make std::regex_traits::transform_primary handle exceptions. This
    change also attempts to make transform_primary closer to the effects
    described in the standard, by not even attempting to use std::collate if
    the locale's std::collate facet has been replaced (see PR 118105).
    Implementing the correct effects for transform_primary requires RTTI, so
    that we don't use some user-defined std::collate facet with unknown
    semantics. When -fno-rtti is used transform_primary just returns an
    empty string, making equivalence classes unusable in std::basic_regex.
    That's not ideal, but I don't have any better ideas.

    I'm unsure if std::regex_traits<C>::transform_primary is supposed to
    convert the string to lower case or not.  The general regex traits
    requirements ([re.req] p20) do say "when character case is not
    considered" but the specification for the std::regex_traits<char> and
    std::regex_traits<wchar_t> specializations ([re.traits] p7) don't say
    anything about that.

    With the r15-6317-geb339c29ee42aa change, transform_primary is not
    called unless the regex actually uses an equivalence class. But using an
    equivalence class would still fail (or be incredibly slow) on some
    targets. With this commit, equivalence classes should be usable on all
    targets, without excessive memory allocations.

    Arguably, we should not even try to call transform_primary for any char
    values over 127, since they're never valid in locales that use UTF-8 or
    7-bit ASCII, and probably for other charsets too. Handling 128
    exceptions for every std::regex compilation is very inefficient, but at
    least it now works instead of failing with std::bad_alloc, and no longer
    allocates 128 x 2GB. Maybe for C++26 we could check the locale's
    std::text_encoding and use that to decide whether to cache equivalence
    classes for char values over 127.

    libstdc++-v3/ChangeLog:

            PR libstdc++/85824
            PR libstdc++/94409
            PR libstdc++/98723
            PR libstdc++/118105
            * include/bits/locale_classes.tcc (collate::do_transform): Check
            errno after calling _M_transform. Use RAII type to manage the
            buffer and to restore errno.
            * include/bits/regex.h (regex_traits::transform_primary): Handle
            exceptions from std::collate::transform and do not try to use
            std::collate for user-defined facets.

Reply via email to