On Fri, Dec 6, 2013 at 11:29 AM, bin.cheng <bin.ch...@arm.com> wrote:
> Hi,
> Entry pr41488 shows a case in which induction variable can't be recognized
> or coalesced.  As noted in the bug entry, one possible fix is to teach PRE
> not to do some specific transformation.  However, it's in nature a scalar
> evolution issue.  Considering below snippet:
>
>    # i_17 = PHI <i_13(5), 0(3)>
>    # _20 = PHI <_5(5), start_4(D)(3)>
>    ...
>    i_13 = i_17 + 1;
>    _5 = start_4(D) + i_13;
>
> Variable _20 appears in the form of PEELED chrec like (start_4, _5)_LOOP,
> meaning it has value "start_4" in the 1st iteration, then changes to _5 in
> following iterations.  PEELED chrec is not implemented by GCC right now, it
> can be simplified into either POLYNOMIAL or PERIOD one.  The POLYNOMIAL
> chrec is processed by GCC after being recognized, as in the examle, _20 is
> actually {start_4, 1}_LOOP.
>
> This patch modifies scalar evolution by trying to simplify PEELED chrec.
> Without this patch, generated code for pr41488.c is like:
> foo:
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
>         cmp     r1, r2
>         bge     .L7
>         push    {r4}
>         mov     r3, r1
>         ldr     r4, [r0]
>         adds    r2, r2, #1
>         adds    r1, r1, #1
>         movs    r0, #0
> .L3:
>         str     r0, [r4, r3, lsl #2]
>         mov     r3, r1
>         adds    r1, r1, #1
>         cmp     r1, r2
>         bne     .L3
>         ldr     r4, [sp], #4
> .L7:
>         bx      lr
>         .size   foo, .-foo
>
> Which is improved to:
> foo:
>         @ args = 0, pretend = 0, frame = 0
>         @ frame_needed = 0, uses_anonymous_args = 0
>         @ link register save eliminated.
>         cmp     r1, r2
>         bge     .L1
>         ldr     r3, [r0]
>         movs    r0, #0
>         add     r1, r3, r1, lsl #2
>         add     r2, r3, r2, lsl #2
> .L3:
>         str     r0, [r1], #4
>         cmp     r1, r2
>         bne     .L3
> .L1:
>         bx      lr
>         .size   foo, .-foo
>
> One point needs to be clarified that I use tree_to_aff_combination_expand in
> the patch.  Rational is the number of the specific PEELED_CHRECs should be
> moderate, I also check the equality literally firstly and resort to affine
> facility if that fails.  By measuring bootstrap of gcc and spec2kint, I
> collected the number of opportunities caught by this patch like below:
>                             literal comparison/affine comparison
> GCC                          ~1200/~250
> Spec2Kint              93/34
>
> I could back trace the ssa chain instead of using affine functions, but that
> would miss some cases.
>
> Bootstrap and test on x86/x86_64/arm.  Is it OK?

I'm not too much into the SCEV theory (CCed Sebastian for this), but at least
tree-affine allows you to compare without going back to trees by computing
the difference and comparing that to zero.

The patch looks sensible otherwise but let's see if Sebastian has anything
to say.

Thanks,
Richard.

> Thanks,
> bin
>
> 2013-12-06  Bin Cheng  <bin.ch...@arm.com>
>
>         PR tree-optimization/41488
>         * tree-ssa-loop-ivopts.c (add_old_iv_candidates): Don't add cand
>         for PEELED_CHREC kind of IV.
>         * tree-scalar-evolution.c: Include necessary header files.
>         (peeled_chrec_map, simplify_peeled_chrec): New.
>         (analyze_evolution_in_loop): New static variable.
>         Call simplify_peeled_chrec.
>         (scev_initialize): Initialize peeled_chrec_map.
>         (scev_reset, scev_finalize): Reset and release peeled_chrec_map.
>
> gcc/testsuite/ChangeLog
> 2013-12-06  Bin Cheng  <bin.ch...@arm.com>
>
>         * gcc.dg/tree-ssa/scev-7.c: New test.
>         * gcc.dg/pr41488.c: New test.

Reply via email to