Hello,

2011/11/5 Eric Botcazou <ebotca...@adacore.com>:
>> Here is a patch which fixes redundant zero extensions problem. Issue
>> is resolved by expanding implicit_zee pass functionality to cover zero
>> and sign extends of different modes. Could please someone review it?
>
> Could you explain the undelying idea?  The current strategy of implicit-zee.c
> is exposed at length at the beginning of the file, but here's a summary:
>
>  1. On some architectures (typically x86-64), implicity zero-extensions are
> applied when instructions operate in selected sub-word modes (SImode here):
>
>  addl edi,eax
>
> has an implicit zero-extension for %rax.
>
>  2. Because of 1, the second instruction in sequences like:
>
>  (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2)))
>  (set (reg:DI x) (zero_extend:DI (reg:SI x)))
>
> is redundant.
>
>  3. The pass recognizes this and transforms the above sequence into:
>
>  (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2))))
>
> and the machine description knows how to translate this into an 'addl'.
>
>
> You're proposing extending this to other modes and other architectures, for
> example QImode on x86.  But does
>
>  addb %dl, %al
>
> modify the entire %eax register on x86?  In other words, are you really after
> implicit (zero-)extensions or after something else, like global elimination of
> redundant extensions?
Initial aim of the pass was to remove zero extentions redundant due to
implicit zero extention in x64. But implementation actually uses
generic approach and seems like a mini-combiner. Pass may combine two
zero extends or combine zero extend with a constant as a special case
but in other cases we just try to merge two instructions and then
check we have corresponding template. It can be easily adopted to
remove all redundant extensions. So, byte add in the example will be
merged with zxero extend only if we have explicit template for it in
machine model.

>
> What's the effect of the patch on the testcase in the PR in terms of insns at
> the RTL level?  Why doesn't the combiner already optimize it?
The patch helps to remove two zero extends from RTL in the test from
PR. I believe zee pass was introduced after postreload pass because we
should have additional memory instructions by that time and therefore
more opportunities for optimization after combiner work.

In this particular test case combiner may also help because we have
byte memory load and extend on combiner pass. But due to some reason
it does not merge them. In combiner dump I see

(insn 39 38 40 4 (set (reg/v:QI 81 [ xr ])
        (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base:
ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal}
     (nil))

(insn 43 42 44 4 (parallel [
            (set (reg:SI 116 [ xr ])
                (zero_extend:SI (reg/v:QI 81 [ xr ])))
            (clobber (reg:CC 17 flags))
        ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and}
     (expr_list:REG_DEAD (reg/v:QI 81 [ xr ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

and

Trying 39 -> 43:

With no additional information.

> Enhancing implicit-zee.c to address missed optimizations like the one reported
> in target/50038 might well be the best approach, but the strategy shift must 
> be
> clearly exposed and discussed.  The reported numbers are certainly impressive.
>
> --
> Eric Botcazou
>

Ilya

Reply via email to