[Bug c++/47960] New: dlopen call during DSO initialization breaks C++ RTTI

2011-03-02 Thread a_salnikov at yahoo dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47960

   Summary: dlopen call during DSO initialization breaks C++ RTTI
   Product: gcc
   Version: 4.3.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: a_salni...@yahoo.com


Hi, 

I am debugging a complex problem with our Linux-based applications sometimes
crashing in mysterious ways. This is kind of usual exception RTTI problem when
the exceptions thrown in one DSO is not correctly recognized in another DSO. We
know so well that DSO and C++ RTTI do not always mix, but we follow all
standard advices about how to build the apps to make RTTI work correctly and
still it breaks.

Our apps are a mixture of the Python interpreter and many C++ shared libraries
loaded from Python (using dlopen). Some C++ libs in turn use dlopen to load
other shared libraries. Everything is linked with the correct flags (no symbol
hiding) and all dlopen calls use RTLD_GLOBAL flags, so we do expect things to
work correctly. Things do work correctly but only when we link the DSOs
together with the C++ main(), thus eliminating top-level dlopen call (other
dlopen calls still remain there). With LD_DEBUG I was able to confirm that in
that case all typeinfo instances are resolved correctly and bound to one
instance in the library linked to main app. In case of Python calling dlopen on
the same library LD_DEBUG shows that typeinfo resolution fails and there are
two instances of the typeinfo object for the Exception type in question.

I tried to reproduce the problem with simple example involving just a couple of
DSOs and after some hair pulling I managed to do it. The peculiarity of the
case (which I did not recognized initially) is that one of the dlopen() calls
happens from the constructor of the global object (that is during the
initialization of the corresponding DSO). If all dlopen calls happen in a
regular way (after main() starts) then there is no problem at all. But if
dlopen() happens during DSO init call then that DSO somehow is not used in the
lookup for the dlopen'ed library symbols even tho DSO has RTLD_GLOBAL set.

The example code that I attach here demonstrates exactly this. To build the
example app just do (should work on Linux without patching):

% tar zxf example.tgz
% make

This will build main app called 'main' and two DSOs: liba.so and libb.so. Main
app calls ldopen for liba.so and calls a run() function from it. liba.so calls
dlopen on libb.so either from run() function or from DSO init code depending on
the particular envvar and then calls run() function from libb. libb's run()
throws an exception that liba's run() tries to catch and analyze. 

To show default correct behavior with dlopen called only from inside main():

% ./main
As expected:
&typeid(ex):0x2b594ce6e600
&typeid(Exception): 0x2b594ce6e600
typeid(ex).name:9Exception
typeid(Exception).name: 9Exception
typeid(Exception)==typeid(ex): true

To see what happens when dlopen is called from liba init code:

% TEST_GLOBAL_INIT=1 ./main
*** Not expected:
&typeid(ex):0x2b4532ad2050
&typeid(Exception): 0x2b45328d0600
typeid(ex).name:9Exception
typeid(Exception).name: 9Exception
typeid(Exception)==typeid(ex): false

In this case the exception cannot be caught with its real type (it is caught as
std::exception) so RTTI is totally broken. Then name in the exception typeinfo
is still correct, but the addresses of the typeinfo in liba and libb are
different.

>From what I gather the C++ code in the example should be legal, global object
initialization should not have restrictions on what functions it can call. But
it seems like the implementation of the RTTI in gcc relies on the features that
do not always work. 

Is there any way to fix the situation or at least to produce some kind of
diagnostics when this situation happens?

Regards,
Andy


[Bug c++/47960] dlopen call during DSO initialization breaks C++ RTTI

2011-03-02 Thread a_salnikov at yahoo dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47960

--- Comment #1 from Andy  2011-03-02 17:58:58 UTC 
---
Created attachment 23518
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23518
test case


[Bug c++/47960] dlopen call during DSO initialization breaks C++ RTTI

2011-03-02 Thread a_salnikov at yahoo dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47960

--- Comment #3 from Andy  2011-03-02 18:50:56 UTC 
---
(In reply to comment #2)
> works as expected with gcc 4.5, possibly due to the change to
> __GXX_MERGED_TYPEINFO_NAMES

Hi Jonathan,

sorry, I do not watch closely the progress, do you mean tha gcc 4.6 has
__GXX_MERGED_TYPEINFO_NAMES disabled?

Andy


[Bug c++/47960] dlopen call during DSO initialization breaks C++ RTTI

2011-03-02 Thread a_salnikov at yahoo dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47960

--- Comment #4 from Andy  2011-03-02 18:51:49 UTC 
---
(In reply to comment #3)
> (In reply to comment #2)
> > works as expected with gcc 4.5, possibly due to the change to
> > __GXX_MERGED_TYPEINFO_NAMES
> 
> Hi Jonathan,
> 
> sorry, I do not watch closely the progress, do you mean tha gcc 4.6 has
> __GXX_MERGED_TYPEINFO_NAMES disabled?
> 
> Andy

Sorry, that should have been 4.5, not 4.6.