At Sat, 24 Dec 2005 01:17:55 +0900, Fumitoshi UKAI wrote: > > It is a bug in libc6, not in grep. > > grep 2.3.1.ds2-4 works fine on libc6 2.3.2.ds1-22 if I rebuilt on sarge. > > > It seems some problem in posix/regex_internal.c:build_wcs_upper_buffer(). > > > > % LANG=ja_JP.EUC-JP gdb ./a.out > > GNU gdb 6.4-debian > > Copyright 2005 Free Software Foundation, Inc. > > GDB is free software, covered by the GNU General Public License, and you are > > welcome to change it and/or distribute copies of it under certain > > conditions. > > Type "show copying" to see the conditions. > > There is absolutely no warranty for GDB. Type "show warranty" for details. > > This GDB was configured as "i486-linux-gnu"...Using host libthread_db > > library "/lib/tls/libthread_db.so.1". > > > > (gdb) run > > Starting program: /tmp/a.out > > > > Program received signal SIGSEGV, Segmentation fault. > > 0xb7f1920f in memcpy () from /lib/tls/libc.so.6 > > (gdb) bt > > #0 0xb7f1920f in memcpy () from /lib/tls/libc.so.6 > > #1 0xb7f4a07a in build_wcs_upper_buffer () from /lib/tls/libc.so.6 > > #2 0xb7f4a335 in re_string_reconstruct () from /lib/tls/libc.so.6 > > #3 0xb7f5bde7 in re_search_internal () from /lib/tls/libc.so.6 > > #4 0xb7f5ea89 in re_search_stub () from /lib/tls/libc.so.6 > > #5 0xb7f5ef63 in re_search () from /lib/tls/libc.so.6 > > #6 0x08048618 in main (argc=1, argv=0xbffffaf4) at rtest.c:28 > > (gdb) > > I investigated this more on this: > > * input multi byte sequence is "\x8f\xa9\xc3", which is > LATIN SMALL LETTER ETH in EUC-JP encoding. > > * if RE_ICASE is used in re_syntax, re_search tries to convert > characters to be upper case by build_wcs_upper_buffer(). > > * when multibyte sequence "\x8f\xa9\xc3" in EUC-JP is converted to > wide character, we'll get 0x00F0 (LATAIN SMALL LETTER ETH; U00F0). > > * This wide character (LATIN SMALL LETTER ETH; U00F0) is lower case, > so we need to towupper() this. > > * when towupper() this wide character (LATIN SMALL LETTER ETH; U00F0), > we'll get wide character 0x00D0 (LATIN CAPITAL LETTER ETH; U00D0). > > * when wide character 0x00D0 (LATIN CAPITAL LETTER ETH; U00D0) back to > multibyte sequence in EUC-JP, it fails, so wcrtomb() returns (size_t)(-1). > (there are no valid byte sequence to represent LATIN CAPITAL LETTER ETH; > U00D0 in EUC-JP encoding). > > * however, build_wcs_upper_buffer() doesn't care this case. > it assumes mbrtowc -> towupper -> wcrtomb always success and only care > the case that lengths of multibyte sequences would be different.
It seems this bug has been fixed on posix/regex_internal.c 1.52 (and 1.41.2.7) http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/posix/regex_internal.c.diff?r1=1.51&r2=1.52&cvsroot=glibc Regards, Fumitoshi UKAI -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]