Am Samstag, 23. Januar 2021, 16:46:18 MEZ hat Jeroen Ooms <jer...@berkeley.edu> 
Folgendes geschrieben:

> A user of the R programming language has reported that std::regex
> causes a hang for certain regular expressions when running in Japanese
> locale. I was able to reproduce this both with our production
> toolchain (mingw-w64 v5 + gcc 8) as well as the latest msys2
> toolchains.
>
> Is this a bug in mingw-w64 or elsewhere? Below a minimal example:
>
> #include <regex>
> int main() {
>   setlocale(LC_ALL, "Japanese");
>   std::regex reg("[0-9]");
>   return 0;
> }

I can reproduce this as well, it took 108 seconds to finish here.

Deep in regex is this function:
std::__detail::_BracketMatcher<std::__cxx11::regex_traits<char>, false, 
false>::_M_make_cache(std::integral_constant<bool, true>)

This caches transformed values of the unicode values 0-255 to the current
locale, with strxfrm_l [1].
This fails for a lot of them for japanese, and as documented, strxfrm_l
returns INT_MAX in this case.
But std::collate::do_transform does not handle any error case, it uses all
return values as the length of the transformed string.
And then it creates a copy of this 2GB string, which takes a lot of time,
around ~1s for each failing character.

It think this should be reported to gcc (libstdc++).


[1] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strxfrm-wcsxfrm-strxfrm-l-wcsxfrm-l?view=msvc-160


_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to