https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63776
Jonathan Wakely <redi at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #11 from Jonathan Wakely <redi at gcc dot gnu.org> --- (In reply to Tim Shen from comment #8) > I don't think std::regex_match<BiIter, Alloc, char, RegexTraits> should care > about decoding a char string to wchar_t string and call > std::regex_match<AnotherBiIter, AnotherAlloc, wchar_t, > std::regex_traits<wchar_t>>, leaving user defined RegexTraits potentially > unused. I agree. > Instead, user can maually decode the utf-8 string (I'm sad we don't have a > standard char iterator adaptor which converts a utf-8 char iterator to > char32_t iterator) and call std::regex_match<..., wchar_t, ...>. Agreed. > These are my understanding, so it's surely possible that I may miss > something. > > Thoughts? Having looked through this again, I think you're right. So this reduced test case is not expected to pass: #include <regex> #include <cassert> int main() { std::locale::global(std::locale("en_US.UTF-8")); std::string s = "joão méroço"; std::regex r{"[[:alpha:]]{4} [[:alpha:]]{6}"}; assert( regex_match(s, r) ); } But this is (assuming wchar_t uses a unicode encoding): #include <regex> #include <cassert> int main() { std::locale::global(std::locale("en_US.UTF-8")); std::string s = "joão méroço"; std::regex r{"[[:alpha:]]{4} [[:alpha:]]{6}"}; assert( regex_match(s, r) ); }