RE: Help w/ PR61538?

Matthew Fortune Mon, 28 Jul 2014 14:39:11 -0700

I'll switch to replying on PR61538. I had not read all the ticket
previously and although I may have found a problem it seems it may not
be the cause of this failure.


The generated code differences after the patches seem significant but
I may not get chance to look at the differences in detail for a little
while.

Matthew

> -----Original Message-----
> From: Joshua Kinard [mailto:ku...@gentoo.org]
> Sent: 28 July 2014 10:40
> To: Matthew Fortune; gcc@gcc.gnu.org
> Subject: Re: Help w/ PR61538?
> 
> On 07/28/2014 04:41, Matthew Fortune wrote:
> > Hi Joshua,
> >
> > I know very little about this area but I'll try and offer some advice
> anyway...
> >
> 
> You know more than I do :)
> 
> 
> >> On 07/05/2014 23:43, Joshua Kinard wrote:
> >>> Hi,
> >>>
> >>> I filed PR61538 about two weeks ago, regarding gcc-4.8.x and up not
> >>> compiling a g++/pthreads-linked app correctly on SGI R1x000-based
> systems
> >>> (Octane, Onyx2), running Linux.  Running the subsequently-compiled
> >>> application simply hangs in a futex syscall until terminated via Ctrl+C.
> >> I
> >>> suspect it's a double-locking bug of some design, as evidenced by strace
> >>> showing two consecutive syscall()'s w/ 0x108e passed as the syscall #
> >> (4238
> >>> or futex on o32 MIPS), but I am stumped as to what else I can do to
> debug
> >> it
> >>> and help fix it.
> >>>
> >> [snip]
> >>> Full details:
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61538
> >>
> >> So I've spent the last few weeks bisecting the gcc tree, and I've
> narrowed
> >> down the set of commits that appear to have introduced this problem:
> >>
> >> 1. 39a8c5eaded1e5771a941c56a49ca0a5e9c5eca0  * config/mips/mips.c
> >> (mips_emit_pre_atomic_barrier_p,)
> >
> > This is the prime candidate for introducing the issue.
> 
> This is my guess, too.  However, it appears to tie in w/ the fourth commit
> because the new mips_emit_{pre,post}_atomic_barrier_p functions added in
> commit 39a8c5ea are removed by commit 30c3c442 a mere ~7 minutes later
> (which I find really odd).  Commit 974f0a74 is really the only one that
> seems innocent, but I suspect the other three are linked.  If mkuvyrkov is
> still around, perhaps he could explain better?
> 
> 
> >> 2. 974f0a74e2116143b88d8cea8e1dd5a9c18ef96c  * config/mips/constraints.md
> >> (ZR): New constraint.
> >
> > Unlikely
> >
> >> 3. 0f8e46b16a53c02d7255dcd6b6e9b5bc7f8ec953  * config/mips/mips.c
> >> (mips_process_sync_loop): Emit cmp result only if
> >
> > Possible but unlikely still
> >
> >> 4. 30c3c4427521f96fb58b6e1debb86da4f113f06f  * emit-rtl.c
> >> (need_atomic_barrier_p): New function.
> >
> > Seems unlikely
> >
> >>
> >> There's a build failure somewhere in the middle of there that is blocking
> me
> >> from figuring out which specific one is the cause, but they all appear to
> be
> >> related anyways.  All four were added on 2012-06-20.
> >>
> >> When I took a git checkout from 2012-06-26 and reverted those four
> commits,
> >> I was able to compile glibc-2.19 and get a working "sln" binary.  I am
> >> unable to easily test the C++ side because I built the checkouts in my
> >> $HOME, and it's too risky to try and shoehorn one of them in as the
> system
> >> compiler.  However, I think the C++ issue is also fixed by reverting the
> >> four, as that also involved hanging in Linux futex syscalls.
> >
> > Here is a wild guess at the problem... I think the workaround for R10000
> to
> > use branch likely instead of delay slot branches is ending up annulling
> > an instruction that is required for certain atomic operations. This is an
> > entirely untested theory (and patch) but can you see if this fixes the
> issue
> > you are seeing:
> 
> Well, the branch-likely thing really only affects a specific revision of the
> R10000 processors.  Later R10000 revisions (3.1+?) and R12000-R16000
> shouldn't be affected.  I've been playing with disabling that specific
> workaround on my Octane's kernel and haven't seen any ill effects yet.
> Though, I haven't tried rebuilding the userland w/ -mno-fix-r10000 just yet.
> 
> If you want, you can take a look at some of the additional info in the
> corresponding Gentoo bug that tracks PR61538:
> 
> https://bugs.gentoo.org/show_bug.cgi?id=516548
> 
> I have a gdb run (comment #5) of the several instructions in
> __lll_lock_wait_private, including register values, as each instruction
> executes.  The hang happens after taking the futex syscall, t0-t3 get set to
> 0x0, and the following "ll v0,0(s0)" is what hangs.  In gcc-4.7 and earlier,
> that 'll' is actually "li v0,2", though control never passes into
> __lll_lock_wait_private in the first place.
> 
> There's also a PNG attached to that bug of the disassembled asm in WinMerge
> they shows what insns actually changed.  Someone who understands MIPS asm
> ordering might be able to make something of that.
> 
> 
> > @@ -13014,7 +13023,8 @@ mips_process_sync_loop (rtx insn, rtx *operands)
> >        mips_multi_copy_insn (tmp3_insn);
> >        mips_multi_set_operand (mips_multi_last_index (), 0, newval);
> >      }
> > -  else if (!(required_oldval && cmp))
> > +  else if (!(required_oldval && cmp)
> > +           || mips_branch_likely)
> >      mips_multi_add_insn ("nop", NULL);
> >
> >    /* CMP = 1 -- either standalone or in a delay slot.  */
> >
> > I suspect I can weave that in more naturally but can you tell me if that
> > fixes the problem first.
> 
> Testing a fix takes about 7.5hrs to rebuild, plus another 3.5 to rebuild
> glibc.  So I am a bit hesitant to task the machine to do that w/o having a
> better idea if that solves it or not.  Technically, shouldn't passing
> -mno-fix-r10000 have a similar effect by causing branch-likely insns to not
> get emitted at all?
> 
> Thanks!,
> 
> --
> Joshua Kinard
> Gentoo/MIPS
> ku...@gentoo.org
> 4096R/D25D95E3 2011-03-28
> 
> "The past tempts us, the present confuses us, the future frightens us.  And
> our lives slip away, moment by moment, lost in that vast, terrible in-
> between."
> 
> --Emperor Turhan, Centauri Republic

RE: Help w/ PR61538?

Reply via email to