https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98723

--- Comment #2 from goughost <goughostt at gmail dot com> ---
That may be acceptable for issue 2.
But additional fixes are need; otherwise, users cannot use regex after calling
setlocale(LC_ALL,"") in such a situation.
Can regex compilers work without calling _M_transform? (at least when
std::regex::collate is not set)

On the other hand, maybe the error condition can be handled by regex compiler
code.
To some extent, the bug is in the regex compiler.
Building cache for '\xee' calls strxfrm() with "\xee\x00", which is not a valid
string if current encoding is utf8.
Also, in GNU/Linux, resulting strings of such (successful) calls might not help
building the cache.

Examples calling strxfrom in GNU/Linux with various locales.
(Note that, in cases when Windows fails, Linux gives trivial results.)

// C
input 61 00, errno 0, res 1, outbuf:  61
input 62 00, errno 0, res 1, outbuf:  62
input aa 00, errno 0, res 1, outbuf:  aa
input bb 00, errno 0, res 1, outbuf:  bb

// C.UTF-8
input 61 00, errno 0, res 1, outbuf:  63
input 62 00, errno 0, res 1, outbuf:  64
input aa 00, errno 0, res 1, outbuf:  03
input bb 00, errno 0, res 1, outbuf:  03

// en_US.UTF-8
input 61 00, errno 0, res 10, outbuf:  51 01 02 01 02 01 00 00 00 00
input 62 00, errno 0, res 10, outbuf:  5e 01 02 01 02 01 00 00 00 00
input aa 00, errno 0, res 5, outbuf:  01 01 01 01 03
input bb 00, errno 0, res 5, outbuf:  01 01 01 01 03

// zh_CN.GB2312 
input 61 00, errno 0, res 11, outbuf:  e1 a9 bd 01 02 01 02 01 00 00 61
input 62 00, errno 0, res 11, outbuf:  e1 a9 be 01 02 01 02 01 00 00 62
input aa 00, errno 0, res 5, outbuf:  01 01 01 01 03
input bb 00, errno 0, res 5, outbuf:  01 01 01 01 03

Reply via email to