Hi Jeff,

On 05/02/16 23:49, Jeff Law wrote:
This patch addresses multiple issues with how we determine when to split paths. 
 The primary motivation is addressing 68541 (P1).


As I've gotten testcodes from Ajit, I've been able to look closely at the path splitting opportunities and consistently what I've seen is that contrary to the original belief, CSE/DCE opportunities are rarely exposed by path splitting. It seems the benefit is more due to removing the unconditional jump inherent in a CFG diamond -- even more so on the microblaze where these jumps have delay slots.

While there are cases where splitting allows for CSE/DCE, they're the exception 
rather than the rule in the codes I've looked at.

With that in mind, we want to encourage path splitting when the amount of code 
we're duplicating is small.   We also want to balance that with not path 
splitting when the original code is something that may be if-convertable.

The vast majority of if-convertable codes are cases where the two predecessors of the join block are single statement blocks with their results feeding the same PHI in the join block. We now reject that case for path splitting so as not to spoil if-conversion.

The only wrinkle with that heuristic is that some codes (MIN, MAX, ABS, COND, calls, etc) are not good candidates for further if conversion. (ie, we have a MIN_EXPR in both predecessors that feed the PHI in the join). So we still allow those cases for splitting. This can be easily refined as we find more cases or as the if-converter is extended.

So with that in place as the first filter, we just need to put a limiter on the number of statements we allow to be copied. I punted a bit and re-used the PARAM for jump threading. They've got enough in common that I felt it was reasonable. If I were to create a new PARAM, I'd probably start with a smaller default value after doing further instrumentation.

While I was working through this I noticed we were path splitting in some cases 
where we shouldn't.  Particularly if we had a CFG like:

          a
         / \
        b   c
       / \ /
      e   d
         / \
        /   \
       /     \
     latch   exit


Duplicating d for the edges b->d and c->d isn't as helpful because the 
duplicate can't be merged into b.  We now detect this kind of case and reject it for 
path splitting.

The new tests are a combination of reductions of codes from Ajit and my own 
poking around.

Looking further out, I really wonder if the low level aspects of path splitting like we're trying to capture here really belong in the cross-jumping phase of the RTL optimizers. The higher level aspects (when we are able to expose CSE/DCE opportunities) should be driven by actually looking at the propagation/simplifications enabled by a potential split path. But those are gcc-7 things.

Bootstrapped and regression tested on x86 linux. Installed on the trunk. I'll obviously keep an eye out for how the tests are handled on other platforms -- I don't think the tests are real sensitive to branch costs and such, but I've been surprised before.

Onward to the jump threading code explosion wrap-up...

jeff


After this patch I also see:
FAIL: gcc.dg/tree-ssa/split-path-1.c scan-tree-dump split-paths "Duplicating join 
block"

on arm, but not on aarch64. My arm-none-eabi cross compiler is configured with:
--with-float=hard --with-cpu=cortex-a9 --with-fpu=neon --with-mode=thumb

Kyrill

Reply via email to