[Bug libstdc++/85824] New: regex constructor crashes under UTF-8 locale on Solaris-sparc when parsing a simple character class

2018-05-17 Thread wanyingloo at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

Bug ID: 85824
   Summary: regex constructor crashes under UTF-8 locale on
Solaris-sparc when parsing a simple character class
   Product: gcc
   Version: 4.9.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wanyingloo at gmail dot com
  Target Milestone: ---

$ cat test.cpp
#include 
#include 

int main (int argc, char *argv[]) {
setlocale(LC_ALL, "");
std::regex("[0-9]");
}

$ echo $LANG
en_US.UTF-8

$ g++ -std=c++11 test.cpp

$ ./a.out 
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::append
Abort (core dumped)

$ uname -a
SunOS t-solaris11sparc-02 5.11 11.3 sun4v sparc sun4v Solaris

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/tmp/wluo/othello/solaris-sparc-packages/bin/../libexec/gcc/sparc-sun-solaris2.10/4.9.2/lto-wrapper
Target: sparc-sun-solaris2.10
Configured with: ../gcc-4.9.2/configure --prefix=/usr
--with-local-prefix=/usr/local --enable-languages=c,c++ --disable-nls
--disable-lto --enable-clocale=generic
--with-stage1-ldflags='-L/data00/builds/trprince/platform-packages-build/idir/solaris-sparc/stage1-packages/lib
-static-libgcc -static-libstdc++ -laio -lmd'
--with-boot-ldflags='-L/data00/builds/trprince/platform-packages-build/idir/solaris-sparc/stage1-packages/lib
-static-libgcc -static-libstdc++ -laio -lmd' --disable-werror
--with-libiconv-prefix=/data00/builds/trprince/platform-packages-build/idir/solaris-sparc/stage1-packages
--with-gnu-ld --with-gnu-as --disable-multiarch --disable-bootstrap
Thread model: posix
gcc version 4.9.2 (GCC) 


I can't reproduce it on Linux using the same GCC version. I did some
investigation and it seems to be because regex compiler doesn't account for
implementation-defined behavior of strxfrm(). I ran the following test on the
same Solaris SPARC machine.

$ cat more_test.cpp 
#include 
#include 
#include 
#include 

int main (int argc, char *argv[]) {
setlocale(LC_ALL, "");
char a[] = { 0x80, '\0' };
printf("%lu\n", strxfrm(NULL, a, 0));
printf("%s\n", strerror(errno));
}

$ g++ -std=c++11 -w more_test.cpp 

$ ./a.out 
4294967295
Illegal byte sequence


In libstdc++-v3/include/bits/locale_classes.tcc, do_transform() is defined as
follows:

do_transform(const _CharT* __lo, const _CharT* __hi) const
{
...
  size_t __res = _M_transform(__c, __p, __len);
...
  __ret.append(__c, __res);


When _M_transform() calls strxfrm() and gets -1 when converting 0x80 under the
UTF-8 locale on Solaris SPARC, it simply assigns -1 to __res of type size_t
which creates a very large number. This causes __ret.append(__c, __res) to
crash. I think it would be nice if the code checks errno and issues a better
error message than the one above.

When I ran this test on a Linux machine with GCC 4.9.2, glibc's strxfrm()
converts 0x80 to 6 bytes. I tend to think Solaris SPARC's libc behavior makes
more sense here since 0x80 on its own isn't a valid UTF-8 code point even
though it's a valid UTF-8 code unit. I have no idea why glibc converts it to 6
bytes. In any event, how strxfrm() converts 0x80 under UTF-8 is
implementation-defined, and I'm not sure do_transform() accounts for that. At
the very least, it can be more defensive by checking errno, I think.

[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris-sparc when parsing a simple character class

2018-05-17 Thread wanyingloo at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

--- Comment #1 from Wanying Luo  ---
Here's GDB backtrace at the time of crash.


#0  0xf56fe7a0 in __lwp_sigqueue () from /lib/libc.so.1
#1  0xf56a1e90 in raise () from /lib/libc.so.1
#2  0xf567a274 in abort () from /lib/libc.so.1
#3  0xff2f2d70 in __gnu_cxx::__verbose_terminate_handler ()
at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#4  0xff2ef844 in __cxxabiv1::__terminate (handler=0xff2f2bac
<__gnu_cxx::__verbose_terminate_handler()>)
at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
#5  0xff2ef8e8 in std::terminate () at
../../../../libstdc++-v3/libsupc++/eh_terminate.cc:57
#6  0xff2efc68 in __cxxabiv1::__cxa_rethrow () at
../../../../libstdc++-v3/libsupc++/eh_throw.cc:125
#7  0xff29c974 in std::collate::do_transform (this=0xff34d9f8 <(anonymous
namespace)::collate_c>, 
__lo=0x4fb3c "\200", __hi=0x4fb3d "")
at
/tmp/wluo/gcc-4.9.2/build/sparc-sun-solaris2.11/libstdc++-v3/include/bits/locale_classes.tcc:245
#8  0xff29c25c in std::collate::transform (this=0xff34d9f8 <(anonymous
namespace)::collate_c>, 
__lo=0x4fb3c "\200", __hi=0x4fb3d "")
at
/tmp/wluo/gcc-4.9.2/build/sparc-sun-solaris2.11/libstdc++-v3/include/bits/locale_classes.h:662
#9  0x0002ead4 in std::string std::regex_traits::transform(char*,
char*) const ()
#10 0x0002c634 in std::string
std::regex_traits::transform_primary(char*, char*) const ()
#11 0x000275f8 in std::__detail::_BracketMatcher,
false, false>::_M_apply(char, std::integral_constant) const ()
#12 0x00022bb4 in std::__detail::_BracketMatcher,
false, false>::_M_make_cache(std::integral_constant) ()
#13 0x0001ed70 in std::__detail::_BracketMatcher,
false, false>::_M_ready() ()
#14 0x0001f958 in void std::__detail::_Compiler
>::_M_insert_bracket_matcher(bool) ()
#15 0x0001c630 in std::__detail::_Compiler
>::_M_bracket_expression() ()
#16 0x000192e8 in std::__detail::_Compiler >::_M_atom()
()
#17 0x00017910 in std::__detail::_Compiler >::_M_term()
()
#18 0x00015868 in std::__detail::_Compiler
>::_M_alternative() ()
#19 0x000141dc in std::__detail::_Compiler
>::_M_disjunction() ()
#20 0x0001381c in std::__detail::_Compiler
>::_Compiler(char const*, char const*, std::regex_traits const&,
std::regex_constants::syntax_option_type) ()
#21 0x00013340 in std::shared_ptr >
> std::__detail::__compile_nfa
>(std::regex_traits::char_type const*, std::regex_traits::char_type
const*, std::regex_traits const&,
std::regex_constants::syntax_option_type) ()
#22 0x0001307c in std::basic_regex
>::basic_regex(char const*, char const*,
std::regex_constants::syntax_option_type) ()
#23 0x00012d84 in std::basic_regex
>::basic_regex(char const*, std::regex_constants::syntax_option_type) ()
#24 0x000120d0 in main ()

[Bug libstdc++/85824] regex constructor crashes under UTF-8 locale on Solaris-sparc when parsing a simple character class

2018-05-17 Thread wanyingloo at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85824

--- Comment #2 from Wanying Luo  ---
(In reply to Wanying Luo from comment #0)
> When I ran this test on a Linux machine with GCC 4.9.2, glibc's strxfrm()
> converts 0x80 to 6 bytes.

Pasting my test on Linux with the same version of GCC for completeness.


$ cat test.cpp
#include 
#include 

int main (int argc, char *argv[]) {
setlocale(LC_ALL, "");
std::regex("[0-9]");
}

$ echo $LANG
en_US.UTF-8

$ g++ -std=c++11 test.cpp

$ ./a.out 

$ cat more_test.cpp 
#include 
#include 
#include 
#include 

int main (int argc, char *argv[]) {
setlocale(LC_ALL, "");
char a[] = { 0x80, '\0' };
printf("%lu\n", strxfrm(NULL, a, 0));
printf("%s\n", strerror(errno));
}

$ g++ -std=c++11 -w more_test.cpp 

$ ./a.out 
6
Success

$ uname -a
Linux d-ubuntu12x64-11 3.2.0-126-generic #169-Ubuntu SMP Fri Mar 31 14:15:21
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/wluo/othello/linux64-packages/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-4.9.2/configure --prefix=/usr
--with-local-prefix=/usr/local --enable-languages=c,c++,fortran --disable-nls
--disable-libcilkrts --disable-lto --enable-libstdcxx-time
--enable-clocale=generic
--with-stage1-ldflags='-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib64
-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib'
--with-boot-ldflags='-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib64
-L/slowfs/sighome/calebs/working/platform-packages-build/idir/linux64/stage1-packages/lib'
--disable-werror --disable-multiarch --disable-bootstrap
Thread model: posix
gcc version 4.9.2 (GCC)

[Bug lto/88159] New: LTO seems to mishandle exceptions that're thrown from c-linkage functions

2018-11-22 Thread wanyingloo at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88159

Bug ID: 88159
   Summary: LTO seems to mishandle exceptions that're thrown from
c-linkage functions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wanyingloo at gmail dot com
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

This looks to be a bug in LTO. Can someone please confirm?

$ cat main.cpp
#include 
extern "C" void run();
extern "C" void throw_ex(const char *err_text)
{
throw std::string(err_text);
}
void call()
{
try { run(); }
catch (std::string &msg) {}
}
int main(int argc, char *argv[])
{
call();
}

$ cat util.c
void throw_ex(const char*);
void run()
{
throw_ex("foobar");
}

$ g++ -v
Using built-in specs.
COLLECT_GCC=/opt/pkg/gcc-7.3.0/bin/g++
COLLECT_LTO_WRAPPER=/opt/pkg/gcc-7.3.0/libexec/gcc/x86_64-pc-linux-gnu/7.3.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/opt/pkg/gcc-7.3.0
--program-suffix=-7.3.0 --enable-languages=c,c++ --disable-multilib
Thread model: posix
gcc version 7.3.0 (GCC) 

$ g++ -xc util.c -xc++ main.cpp -O2

$ ./a.out 

$ g++ -xc util.c -xc++ main.cpp -O2 -flto

$ ./a.out 
terminate called after throwing an instance of
'std::__cxx11::basic_string, std::allocator
>'
Aborted

[Bug lto/88159] LTO seems to mishandle exceptions that're thrown from c-linkage functions

2018-11-22 Thread wanyingloo at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88159

--- Comment #3 from Wanying Luo  ---
Ah, you're right. Thanks for pointing it out, Andrew.