https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118105
Bug ID: 118105 Summary: std::regex_traits::transform_primary is not correct and might be unimplementable Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: redi at gcc dot gnu.org Blocks: 102445 Target Milestone: --- Our code says: * Effects: if typeid(use_facet<collate<_Ch_type> >) == * typeid(collate_byname<_Ch_type>) and the form of the sort key * returned by collate_byname<_Ch_type>::transform(__first, __last) * is known and can be converted into a primary sort key * then returns that key, otherwise returns an empty string. * * @todo Implement this function correctly. */ template<typename _Fwd_iter> string_type transform_primary(_Fwd_iter __first, _Fwd_iter __last) const { // TODO : this is not entirely correct. // This function requires extra support from the platform. // // Read http://gcc.gnu.org/ml/libstdc++/2013-09/msg00117.html and // http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2003/n1429.htm // for details. typedef std::ctype<char_type> __ctype_type; const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale)); _GLIBCXX_STD_C::vector<char_type> __s(__first, __last); __fctyp.tolower(__s.data(), __s.data() + __s.size()); return this->transform(__s.data(), __s.data() + __s.size()); } N1429 says: Note also that there is no portable way to implement transform_primary in terms of std::locale, since even if the sort key format returned by std::collate_byname<>::transform is known and can be converted into a primary sort key, the user can still install their own custom std::collate implementation into the locale object used, and that can use any sort key format they see fit. The transform_primary member function is therefore more of use to custom traits classes, and should throw an exception if it cannot be implemented for a particular locale. Unfortunately this significantly reduces the usefulness of POSIX style equivalence classes within regular expressions, but that cannot be fixed without modifying the std::collate facet. Note that primary sort keys can not be obtained by converting to all lower case and then obtaining a regular sort key: primary keys take into account only the primary character shape, case, accentation and locale specific tailoring are not taken into account, so for example the characters "AÀÁÂÃÄÅaàáâãäå" should all produce the same primary sort key. This should probably not have been added to the standard! Anyway, we should at least do the typeid check that the standard requires, so that we know whether collate::transform can be used. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102445 [Bug 102445] [meta-bug] std::regex issues