On Mon, 2020-06-29 at 08:58 +0200, Richard Biener wrote:
> On Sat, Jun 27, 2020 at 4:52 PM Marc Glisse <marc.gli...@inria.fr> wrote:
> > On Fri, 26 Jun 2020, Jeff Law via Gcc-patches wrote:
> > 
> > > In theory yes, but there are cases where paths converge (like you've 
> > > shown) where
> > > you may have evaluated to a constant on the paths, but it's not a 
> > > constant at the
> > > convergence point.  You have to be very careful using b_c_p like this and 
> > > it's
> > > been a regular source of kernel bugs.
> > > 
> > > 
> > > I'd recommend looking at the .ssa dump and walk forward from there if the 
> > > .ssa
> > > dump looks correct.
> > 
> > Here is the last dump before thread1 (105t.mergephi2). I don't see
> > anything incorrect in it.
> > 
> > ledtrig_cpu (_Bool is_active)
> > {
> >    int old;
> >    int iftmp.0_1;
> >    int _5;
> > 
> >    <bb 2> [local count: 1073741824]:
> >    if (is_active_2(D) != 0)
> >      goto <bb 4>; [50.00%]
> >    else
> >      goto <bb 3>; [50.00%]
> > 
> >    <bb 3> [local count: 536870913]:
> > 
> >    <bb 4> [local count: 1073741824]:
> >    # iftmp.0_1 = PHI <1(2), -1(3)>
> >    _5 = __builtin_constant_p (iftmp.0_1);
> >    if (_5 != 0)
> >      goto <bb 5>; [50.00%]
> >    else
> >      goto <bb 8>; [50.00%]
> > 
> >    <bb 5> [local count: 536870913]:
> >    if (iftmp.0_1 >= -128)
> >      goto <bb 6>; [50.00%]
> >    else
> >      goto <bb 8>; [50.00%]
> > 
> >    <bb 6> [local count: 268435456]:
> >    if (iftmp.0_1 <= 127)
> >      goto <bb 7>; [34.00%]
> >    else
> >      goto <bb 8>; [66.00%]
> > 
> >    <bb 7> [local count: 91268056]:
> >    __asm__ __volatile__("asi %0,%1
> > " : "ptr" "=Q" MEM[(int *)&num_active_cpus] : "val" "i" iftmp.0_1, "Q" 
> > MEM[(int *)&num_active_cpus] : "memory", "cc");
> >    goto <bb 9>; [100.00%]
> > 
> >    <bb 8> [local count: 982473769]:
> >    __asm__ __volatile__("laa %0,%2,%1
> > " : "old" "=d" old_8, "ptr" "=Q" MEM[(int *)&num_active_cpus] : "val" "d" 
> > iftmp.0_1, "Q" MEM[(int *)&num_active_cpus] : "memory", "cc");
> > 
> >    <bb 9> [local count: 1073741824]:
> >    return;
> > 
> > }
> > 
> > There is a single _b_c_p, the immediate asm argument is exactly the
> > argument of _b_c_p, and it is in the branch protected by _b_c_p.
> > 
> > Now the thread1 dump, for comparison
> > 
> > ledtrig_cpu (_Bool is_active)
> > {
> >    int old;
> >    int iftmp.0_4;
> >    int iftmp.0_6;
> >    int _7;
> >    int _12;
> >    int iftmp.0_13;
> >    int iftmp.0_14;
> > 
> >    <bb 2> [local count: 1073741824]:
> >    if (is_active_2(D) != 0)
> >      goto <bb 3>; [50.00%]
> >    else
> >      goto <bb 4>; [50.00%]
> > 
> >    <bb 3> [local count: 536870912]:
> >    # iftmp.0_6 = PHI <1(2)>
> >    _7 = __builtin_constant_p (iftmp.0_6);
> >    if (_7 != 0)
> >      goto <bb 6>; [50.00%]
> >    else
> >      goto <bb 8>; [50.00%]
> > 
> >    <bb 4> [local count: 536870912]:
> >    # iftmp.0_4 = PHI <-1(2)>
> >    _12 = __builtin_constant_p (iftmp.0_4);
> >    if (_12 != 0)
> >      goto <bb 5>; [50.00%]
> >    else
> >      goto <bb 8>; [50.00%]
> > 
> >    <bb 5> [local count: 268435456]:
> >    if (iftmp.0_4 >= -128)
> >      goto <bb 7>; [20.00%]
> >    else
> >      goto <bb 8>; [80.00%]
> > 
> >    <bb 6> [local count: 214748364]:
> >    if (iftmp.0_6 <= 127)
> >      goto <bb 7>; [12.00%]
> >    else
> >      goto <bb 8>; [88.00%]
> > 
> >    <bb 7> [local count: 91268056]:
> >    # iftmp.0_13 = PHI <iftmp.0_6(6), iftmp.0_4(5)>
> >    __asm__ __volatile__("asi %0,%1
> > " : "ptr" "=Q" MEM[(int *)&num_active_cpus] : "val" "i" iftmp.0_13, "Q" 
> > MEM[(int *)&num_active_cpus] : "memory", "cc");
> >    goto <bb 9>; [100.00%]
> > 
> >    <bb 8> [local count: 982473769]:
> >    # iftmp.0_14 = PHI <iftmp.0_4(5), iftmp.0_4(4), iftmp.0_6(6), 
> > iftmp.0_6(3)>
> >    __asm__ __volatile__("laa %0,%2,%1
> > " : "old" "=d" old_8, "ptr" "=Q" MEM[(int *)&num_active_cpus] : "val" "d" 
> > iftmp.0_14, "Q" MEM[(int *)&num_active_cpus] : "memory", "cc");
> > 
> >    <bb 9> [local count: 1073741824]:
> >    return;
> > 
> > }
> > 
> > Thread1 decides to separate the paths is_active and !is_active
> > (surprisingly, for one it optimizes out the comparison <= 127 and for the
> > other the comparison >= -128, while it could optimize both in both cases).
> > And it decides to converge after the comparisons, but before the asm.
> > 
> > What the pass did does seem to hurt. It looks like if we duplicate _b_c_p,
> > we may need to duplicate far enough to include all the blocks dominated by
> > _b_c_p==true (including the asm, here). Otherwise, any _b_c_p can be
> > optimized to true, because for a boolean
> > 
> > b is the same as b ? true : false
> > __builtin_constant_p(b ? true : false) would be the same as b ?
> > __builtin_constant_p(true) : __builtin_constant_p(false), i.e. true.
> > 
> > It is too bad we don't have any optimization pass using ranges between IPA
> > and thread1, that would have gotten rid of the comparisons, and hence the
> > temptation to thread. Adding always_inline on atomic_add (or flatten on
> > the caller) does help: EVRP removes the comparisons.
> > 
> > Do you see a way forward without changing what thread1 does or declaring
> > the testcase as unsupported?
> 
> Most of the cases I've seen involve transforms that make _b_c_p constant
> on one path and then introduce a new PHI merging two _b_c_p values
> to be then tested in a not simplified condition.  I'm not sure how to fend
> off jump threading (yeah, it's nearly always jump threading doing this...)
> doing this but certainly the easiest way would be to simply disallow
> [jump threading] from duplicating _b_c_p calls.
> 
> Or fold _b_c_p even earlier (though I definitely saw early backwards
> jump threading mess up such a case).
I'm not 100% sure it's duplication of the b_c_p call that's the problem.  Or
perhaps better stated, I think anything which results in an incremental update
with an SSA_NAME that's used in a b_c_p call is going to potentially be
problematical.  I suspect those are strong correlated though.
Jeff


> 
> Richard.
> 
> > --
> > Marc Glisse

Reply via email to