On Tue, 2020-09-22 at 22:24 +0200, Jan Hubicka wrote: > > On Tue, 2020-09-22 at 16:13 -0400, David Malcolm wrote: > > > On Tue, 2020-09-22 at 20:39 +0200, Jan Hubicka wrote: > > > > David, > > > > with jit I get the following: > > > > /usr/local/x86_64-pc-linux-gnu/bin/ld: final link failed: > > > > nonrepresentable section on output > > > > collect2: error: ld returned 1 exit status > > > > make[3]: *** [../../gcc/jit/Make-lang.in:121: > > > > libgccjit.so.0.0.1] > > > > Error > > > > > > > > Is there a fix/workaround? > > > > > > I don't recognize that specific error, but googling suggests it > > > may > > > relate to position-independent code. > > > > > > Are you configuring with --enable-host-shared ? This is needed > > > when > > > enabling "jit" in --enable-languages (but slows down the compiler > > > by > > > a > > > few percent, which is why "jit" isn't in "all"). > > > > > > > > > > Patch I am trying to test/debug is attached, it fixes the > > > > selftest > > > > issue > > > > and the destructor. > > > > > > Thanks. > > > > > > Sadly it doesn't fix the jit crashes, which are now in bugzilla > > > (as > > > PR > > > jit/97169). > > > > > > Without the patch, the jit testcases crash at the end of the 1st > > > in- > > > process iteration, in the dtor for the the new pass. > > > > > > With the patch the jit testcases crash inside the 3rd in-process > > > iteration, invoking a modref_summaries finalizer at whichever GC- > > > collection point happens first, I think, where the > > > modref_summaries * > > > seems to be pointing at corrupt data: > > > > > > (gdb) p *(modref_summaries *)p > > > $2 = {<fast_function_summary<modref_summary*, va_gc>> = > > > {<function_summary_base<modref_summary>> = { > > > _vptr.function_summary_base = 0x200000001, > > > m_symtab_insertion_hook = 0x1, m_symtab_removal_hook = > > > 0x100000004, > > > m_symtab_duplication_hook = 0x0, m_symtab = 0x644210, > > > m_insertion_enabled = 112, m_allocator = {m_allocator = { > > > m_name = 0x0, m_id = 0, m_elts_per_block = 1, > > > m_returned_free_list = 0x7afafaf01, > > > m_virgin_free_list = 0xafafafafafaf0001 <error: Cannot > > > access > > > memory at address 0xafafafafafaf0001>, > > > m_virgin_elts_remaining = 0, m_elts_allocated = > > > 140737080343888, m_elts_free = 0, m_blocks_allocated = 0, > > > m_block_list = 0x0, m_elt_size = 6517120, m_size = 13, > > > m_initialized = false, m_location = { > > > m_filename = 0x0, m_function = 0x0, m_line = 1, > > > m_origin > > > = > > > 2947481856, m_ggc = false}}}}, > > > m_vector = 0x0}, ipa = false} > > > > > > I think this latter crash may be a pre-existing bug in how the > > > jit > > > interacts with gc finalizers. I think the finalizers are > > > accumulating > > > from in-process run to run, leading to chaos, but I need to debug > > > it > > > some more to be sure. Alternatively, is there a way that a > > > finalizer > > > is being registered, and then the object is somehow clobbered > > > without > > > the finalizer being unregistered from the vec of finalizers? > > > > Aha: this patch on top of yours seems to fix it, at least for the > > test > > I've been debugging. > > > > Calling gcc_delete on something seems to delete it without removing > > the > > finalizer, leaving the finalizer around to run on whatever the > > memory > > eventually gets reused for, leading to segfaults: > > > > diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c > > index 4b9c4db4ee9..64d314321cb 100644 > > --- a/gcc/ipa-modref.c > > +++ b/gcc/ipa-modref.c > > @@ -1372,8 +1372,6 @@ unsigned int pass_ipa_modref::execute > > (function *) > > void > > ipa_modref_c_finalize () > > { > > - if (summaries) > > - ggc_delete (summaries); > > summaries = NULL; > > } > > Ah, thanks. That is very odd behaviour of delete indeed.
Summarizing what's going on: We have a use-after-ggc_delete happening with the finalizers code. analyze_function has: summaries = new (ggc_alloc <modref_summaries> ()) modref_summaries (symtab); ggc_alloc (as opposed to ggc_alloc_no_dtor) uses need_finalization_p<T> and "sees" that the class has a nontrivial dtor, and hence it passes finalize<T> to ggc_internal_alloc as the "f" callback. Within ggc_internal_alloc we have: if (f) add_finalizer (result, f, s, n); and so that callback is registered within G.finalizers - but there's nothing stored in the object itself to track that finalizer. Later, in ipa_modref_c_finalize, gcc_delete is called on the mod_summaries object, but the finalizer is still registered in G.finalizers with its address. Later, a GC happens, and if the bit for "marked" on that old modref_summaries object happens to be cleared (with whatever that memory is now being used for, if anything), the finalizer callback is called, and ~modref_summaries is called with its "this" pointing at random bits, and we have a segfault. This seems like a big "gotcha" in ggc_delete: it doesn't remove any finalizer for the object, instead leaving it as a timebomb to happen in some future collection. Should ggc_delete remove the finalizer? It would have to scan the G.finalizer vecs to find them - O(N). Alternatively, perhaps we could stash some kind of reference to the finalizer in memory near the object (perhaps allocating a header). Or we could put a big warning on ggc_delete saying not to use it on ggc_alloc-ed objects with dtors. I'm not sure what the best approach here is. There were only 4 places in the source tree where ggc_delete were used before the patch, which added 4 places; maybe we should just remove those new ggc_deletes and set the pointers to NULL and let them be cleared as regular garbage? Dave