On Tue, 2020-09-22 at 22:24 +0200, Jan Hubicka wrote:
> > On Tue, 2020-09-22 at 16:13 -0400, David Malcolm wrote:
> > > On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote:
> > > > David,
> > > > with jit I get the following:
> > > > /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed:
> > > > nonrepresentable section on output
> > > > collect2: error: ld returned 1 exit status
> > > > make[3]: *** [../../gcc/jit/Make-lang.in:121:
> > > > libgccjit.so.0.0.1]
> > > > Error
> > > > 
> > > > Is there a fix/workaround?
> > > 
> > > I don't recognize that specific error, but googling suggests it
> > > may
> > > relate to position-independent code.
> > > 
> > > Are you configuring with --enable-host-shared ?  This is needed
> > > when
> > > enabling "jit" in --enable-languages (but slows down the compiler
> > > by
> > > a
> > > few percent, which is why "jit" isn't in "all").
> > > 
> > > 
> > > > Patch I am trying to test/debug is attached, it fixes the
> > > > selftest
> > > > issue
> > > > and the destructor.
> > > 
> > > Thanks.
> > > 
> > > Sadly it doesn't fix the jit crashes, which are now in bugzilla
> > > (as
> > > PR
> > > jit/97169).
> > > 
> > > Without the patch, the jit testcases crash at the end of the 1st
> > > in-
> > > process iteration, in the dtor for the the new pass.
> > > 
> > > With the patch the jit testcases crash inside the 3rd in-process
> > > iteration, invoking a modref_summaries finalizer at whichever GC-
> > > collection point happens first, I think, where the
> > > modref_summaries *
> > > seems to be pointing at corrupt data:
> > > 
> > > (gdb) p *(modref_summaries *)p
> > > $2 = {<fast_function_summary<modref_summary*, va_gc>> =
> > > {<function_summary_base<modref_summary>> = {
> > >       _vptr.function_summary_base = 0x200000001,
> > > m_symtab_insertion_hook = 0x1, m_symtab_removal_hook =
> > > 0x100000004, 
> > >       m_symtab_duplication_hook = 0x0, m_symtab = 0x644210,
> > > m_insertion_enabled = 112, m_allocator = {m_allocator = {
> > >           m_name = 0x0, m_id = 0, m_elts_per_block = 1,
> > > m_returned_free_list = 0x7afafaf01, 
> > >           m_virgin_free_list = 0xafafafafafaf0001 <error: Cannot
> > > access
> > > memory at address 0xafafafafafaf0001>, 
> > >           m_virgin_elts_remaining = 0, m_elts_allocated =
> > > 140737080343888, m_elts_free = 0, m_blocks_allocated = 0, 
> > >           m_block_list = 0x0, m_elt_size = 6517120, m_size = 13,
> > > m_initialized = false, m_location = {
> > >             m_filename = 0x0, m_function = 0x0, m_line = 1,
> > > m_origin
> > > =
> > > 2947481856, m_ggc = false}}}}, 
> > >     m_vector = 0x0}, ipa = false}
> > > 
> > > I think this latter crash may be a pre-existing bug in how the
> > > jit
> > > interacts with gc finalizers.  I think the finalizers are
> > > accumulating
> > > from in-process run to run, leading to chaos, but I need to debug
> > > it
> > > some more to be sure.  Alternatively, is there a way that a
> > > finalizer
> > > is being registered, and then the object is somehow clobbered
> > > without
> > > the finalizer being unregistered from the vec of finalizers?
> > 
> > Aha: this patch on top of yours seems to fix it, at least for the
> > test
> > I've been debugging.
> > 
> > Calling gcc_delete on something seems to delete it without removing
> > the
> > finalizer, leaving the finalizer around to run on whatever the
> > memory
> > eventually gets reused for, leading to segfaults:
> > 
> > diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> > index 4b9c4db4ee9..64d314321cb 100644
> > --- a/gcc/ipa-modref.c
> > +++ b/gcc/ipa-modref.c
> > @@ -1372,8 +1372,6 @@ unsigned int pass_ipa_modref::execute
> > (function *)
> >  void
> >  ipa_modref_c_finalize ()
> >  {
> > -  if (summaries)
> > -    ggc_delete (summaries);
> >    summaries = NULL;
> >  }
> 
> Ah, thanks.  That is very odd behaviour of delete indeed.

Summarizing what's going on:

We have a use-after-ggc_delete happening with the finalizers code.

analyze_function has:

    summaries = new (ggc_alloc <modref_summaries> ())
                     modref_summaries (symtab);

ggc_alloc (as opposed to ggc_alloc_no_dtor) uses need_finalization_p<T>
and "sees" that the class has a nontrivial dtor, and hence it passes
finalize<T> to ggc_internal_alloc as the "f" callback.

Within ggc_internal_alloc we have:

  if (f)
    add_finalizer (result, f, s, n);

and so that callback is registered within G.finalizers - but there's
nothing stored in the object itself to track that finalizer.

Later, in ipa_modref_c_finalize, gcc_delete is called on the
mod_summaries object, but the finalizer is still registered in
G.finalizers with its address.

Later, a GC happens, and if the bit for "marked" on that old
modref_summaries object happens to be cleared (with whatever that
memory is now being used for, if anything), the finalizer callback is
called, and ~modref_summaries is called with its "this" pointing at
random bits, and we have a segfault.

This seems like a big "gotcha" in ggc_delete: it doesn't remove any
finalizer for the object, instead leaving it as a timebomb to happen in
some future collection.

Should ggc_delete remove the finalizer?  It would have to scan the
G.finalizer vecs to find them - O(N).  Alternatively, perhaps we could
stash some kind of reference to the finalizer in memory near the object
(perhaps allocating a header).

Or we could put a big warning on ggc_delete saying not to use it on
ggc_alloc-ed objects with dtors.

I'm not sure what the best approach here is.

There were only 4 places in the source tree where ggc_delete were used
before the patch, which added 4 places; maybe we should just remove
those new ggc_deletes and set the pointers to NULL and let them be
cleared as regular garbage?

Dave

Reply via email to