[Bug libstdc++/85824] New: regex constructor crashes under UTF-8 locale on Solaris-sparc when parsing a simple character class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824 Bug ID: 85824 Summary: regex constructor crashes under UTF-8 locale on Solaris-sparc when parsing a simple character class Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: wanyingloo at gmail dot com Target Milestone: --- $ cat test.cpp #include #include int main (int argc, char *argv[]) { setlocale(LC_ALL, ""); std::regex("[0-9]"); } $ echo $LANG en_US.UTF-8 $ g++ -std=c++11 test.cpp $ ./a.out terminate called after throwing an instance of 'std::length_error' what(): basic_string::append Abort (core dumped) $ uname -a SunOS t-solaris11sparc-02 5.11 11.3 sun4v sparc sun4v Solaris $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/tmp/wluo/othello/solaris-sparc-packages/bin/../libexec/gcc/sparc-sun-solaris2.10/4.9.2/lto-wrapper Target: sparc-sun-solaris2.10 Configured with: ../gcc-4.9.2/configure --prefix=/usr --with-local-prefix=/usr/local --enable-languages=c,c++ --disable-nls --disable-lto --enable-clocale=generic --with-stage1-ldflags='-L/data00/builds/trprince/platform-packages-build/idir/solaris-sparc/stage1-packages/lib -static-libgcc -static-libstdc++ -laio -lmd' --with-boot-ldflags='-L/data00/builds/trprince/platform-packages-build/idir/solaris-sparc/stage1-packages/lib -static-libgcc -static-libstdc++ -laio -lmd' --disable-werror --with-libiconv-prefix=/data00/builds/trprince/platform-packages-build/idir/solaris-sparc/stage1-packages --with-gnu-ld --with-gnu-as --disable-multiarch --disable-bootstrap Thread model: posix gcc version 4.9.2 (GCC) I can't reproduce it on Linux using the same GCC version. I did some investigation and it seems to be because regex compiler doesn't account for implementation-defined behavior of strxfrm(). I ran the following test on the same Solaris SPARC machine. $ cat more_test.cpp #include #include #include #include int main (int argc, char *argv[]) { setlocale(LC_ALL, ""); char a[] = { 0x80, '\0' }; printf("%lu\n", strxfrm(NULL, a, 0)); printf("%s\n", strerror(errno)); } $ g++ -std=c++11 -w more_test.cpp $ ./a.out 4294967295 Illegal byte sequence In libstdc++-v3/include/bits/locale_classes.tcc, do_transform() is defined as follows: do_transform(const _CharT* __lo, const _CharT* __hi) const { ... size_t __res = _M_transform(__c, __p, __len); ... __ret.append(__c, __res); When _M_transform() calls strxfrm() and gets -1 when converting 0x80 under the UTF-8 locale on Solaris SPARC, it simply assigns -1 to __res of type size_t which creates a very large number. This causes __ret.append(__c, __res) to crash. I think it would be nice if the code checks errno and issues a better error message than the one above. When I ran this test on a Linux machine with GCC 4.9.2, glibc's strxfrm() converts 0x80 to 6 bytes. I tend to think Solaris SPARC's libc behavior makes more sense here since 0x80 on its own isn't a valid UTF-8 code point even though it's a valid UTF-8 code unit. I have no idea why glibc converts it to 6 bytes. In any event, how strxfrm() converts 0x80 under UTF-8 is implementation-defined, and I'm not sure do_transform() accounts for that. At the very least, it can be more defensive by checking errno, I think.
[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris-sparc when parsing a simple character class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824 --- Comment #1 from Wanying Luo --- Here's GDB backtrace at the time of crash. #0 0xf56fe7a0 in __lwp_sigqueue () from /lib/libc.so.1 #1 0xf56a1e90 in raise () from /lib/libc.so.1 #2 0xf567a274 in abort () from /lib/libc.so.1 #3 0xff2f2d70 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95 #4 0xff2ef844 in __cxxabiv1::__terminate (handler=0xff2f2bac <__gnu_cxx::__verbose_terminate_handler()>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47 #5 0xff2ef8e8 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:57 #6 0xff2efc68 in __cxxabiv1::__cxa_rethrow () at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:125 #7 0xff29c974 in std::collate::do_transform (this=0xff34d9f8 <(anonymous namespace)::collate_c>, __lo=0x4fb3c "\200", __hi=0x4fb3d "") at /tmp/wluo/gcc-4.9.2/build/sparc-sun-solaris2.11/libstdc++-v3/include/bits/locale_classes.tcc:245 #8 0xff29c25c in std::collate::transform (this=0xff34d9f8 <(anonymous namespace)::collate_c>, __lo=0x4fb3c "\200", __hi=0x4fb3d "") at /tmp/wluo/gcc-4.9.2/build/sparc-sun-solaris2.11/libstdc++-v3/include/bits/locale_classes.h:662 #9 0x0002ead4 in std::string std::regex_traits::transform(char*, char*) const () #10 0x0002c634 in std::string std::regex_traits::transform_primary(char*, char*) const () #11 0x000275f8 in std::__detail::_BracketMatcher, false, false>::_M_apply(char, std::integral_constant) const () #12 0x00022bb4 in std::__detail::_BracketMatcher, false, false>::_M_make_cache(std::integral_constant) () #13 0x0001ed70 in std::__detail::_BracketMatcher, false, false>::_M_ready() () #14 0x0001f958 in void std::__detail::_Compiler >::_M_insert_bracket_matcher(bool) () #15 0x0001c630 in std::__detail::_Compiler >::_M_bracket_expression() () #16 0x000192e8 in std::__detail::_Compiler >::_M_atom() () #17 0x00017910 in std::__detail::_Compiler >::_M_term() () #18 0x00015868 in std::__detail::_Compiler >::_M_alternative() () #19 0x000141dc in std::__detail::_Compiler >::_M_disjunction() () #20 0x0001381c in std::__detail::_Compiler >::_Compiler(char const*, char const*, std::regex_traits const&, std::regex_constants::syntax_option_type) () #21 0x00013340 in std::shared_ptr > > std::__detail::__compile_nfa >(std::regex_traits::char_type const*, std::regex_traits::char_type const*, std::regex_traits const&, std::regex_constants::syntax_option_type) () #22 0x0001307c in std::basic_regex >::basic_regex(char const*, char const*, std::regex_constants::syntax_option_type) () #23 0x00012d84 in std::basic_regex >::basic_regex(char const*, std::regex_constants::syntax_option_type) () #24 0x000120d0 in main ()
[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris-sparc when parsing a simple character class
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824 --- Comment #2 from Wanying Luo --- (In reply to Wanying Luo from comment #0) > When I ran this test on a Linux machine with GCC 4.9.2, glibc's strxfrm() > converts 0x80 to 6 bytes. Pasting my test on Linux with the same version of GCC for completeness. $ cat test.cpp #include #include int main (int argc, char *argv[]) { setlocale(LC_ALL, ""); std::regex("[0-9]"); } $ echo $LANG en_US.UTF-8 $ g++ -std=c++11 test.cpp $ ./a.out $ cat more_test.cpp #include #include #include #include int main (int argc, char *argv[]) { setlocale(LC_ALL, ""); char a[] = { 0x80, '\0' }; printf("%lu\n", strxfrm(NULL, a, 0)); printf("%s\n", strerror(errno)); } $ g++ -std=c++11 -w more_test.cpp $ ./a.out 6 Success $ uname -a Linux d-ubuntu12x64-11 3.2.0-126-generic #169-Ubuntu SMP Fri Mar 31 14:15:21 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/home/wluo/othello/linux64-packages/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../gcc-4.9.2/configure --prefix=/usr --with-local-prefix=/usr/local --enable-languages=c,c++,fortran --disable-nls --disable-libcilkrts --disable-lto --enable-libstdcxx-time --enable-clocale=generic --with-stage1-ldflags='-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib64 -L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib' --with-boot-ldflags='-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib64 -L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib' --disable-werror --disable-multiarch --disable-bootstrap Thread model: posix gcc version 4.9.2 (GCC)
[Bug lto/88159] New: LTO seems to mishandle exceptions that're thrown from c-linkage functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88159 Bug ID: 88159 Summary: LTO seems to mishandle exceptions that're thrown from c-linkage functions Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: wanyingloo at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- This looks to be a bug in LTO. Can someone please confirm? $ cat main.cpp #include extern "C" void run(); extern "C" void throw_ex(const char *err_text) { throw std::string(err_text); } void call() { try { run(); } catch (std::string &msg) {} } int main(int argc, char *argv[]) { call(); } $ cat util.c void throw_ex(const char*); void run() { throw_ex("foobar"); } $ g++ -v Using built-in specs. COLLECT_GCC=/opt/pkg/gcc-7.3.0/bin/g++ COLLECT_LTO_WRAPPER=/opt/pkg/gcc-7.3.0/libexec/gcc/x86_64-pc-linux-gnu/7.3.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../configure --prefix=/opt/pkg/gcc-7.3.0 --program-suffix=-7.3.0 --enable-languages=c,c++ --disable-multilib Thread model: posix gcc version 7.3.0 (GCC) $ g++ -xc util.c -xc++ main.cpp -O2 $ ./a.out $ g++ -xc util.c -xc++ main.cpp -O2 -flto $ ./a.out terminate called after throwing an instance of 'std::__cxx11::basic_string, std::allocator >' Aborted
[Bug lto/88159] LTO seems to mishandle exceptions that're thrown from c-linkage functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88159 --- Comment #3 from Wanying Luo --- Ah, you're right. Thanks for pointing it out, Andrew.