2011/11/6 Richard Guenther <[email protected]>:
> On Sun, Nov 6, 2011 at 11:50 AM, Ilya Enkovich <[email protected]> wrote:
>> Hello,
>>
>> 2011/11/5 Eric Botcazou <[email protected]>:
>>>> Here is a patch which fixes redundant zero extensions problem. Issue
>>>> is resolved by expanding implicit_zee pass functionality to cover zero
>>>> and sign extends of different modes. Could please someone review it?
>>>
>>> Could you explain the undelying idea? The current strategy of
>>> implicit-zee.c
>>> is exposed at length at the beginning of the file, but here's a summary:
>>>
>>> 1. On some architectures (typically x86-64), implicity zero-extensions are
>>> applied when instructions operate in selected sub-word modes (SImode here):
>>>
>>> addl edi,eax
>>>
>>> has an implicit zero-extension for %rax.
>>>
>>> 2. Because of 1, the second instruction in sequences like:
>>>
>>> (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2)))
>>> (set (reg:DI x) (zero_extend:DI (reg:SI x)))
>>>
>>> is redundant.
>>>
>>> 3. The pass recognizes this and transforms the above sequence into:
>>>
>>> (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2))))
>>>
>>> and the machine description knows how to translate this into an 'addl'.
>>>
>>>
>>> You're proposing extending this to other modes and other architectures, for
>>> example QImode on x86. But does
>>>
>>> addb %dl, %al
>>>
>>> modify the entire %eax register on x86? In other words, are you really
>>> after
>>> implicit (zero-)extensions or after something else, like global elimination
>>> of
>>> redundant extensions?
>> Initial aim of the pass was to remove zero extentions redundant due to
>> implicit zero extention in x64. But implementation actually uses
>> generic approach and seems like a mini-combiner. Pass may combine two
>> zero extends or combine zero extend with a constant as a special case
>> but in other cases we just try to merge two instructions and then
>> check we have corresponding template. It can be easily adopted to
>> remove all redundant extensions. So, byte add in the example will be
>> merged with zxero extend only if we have explicit template for it in
>> machine model.
>>
>>>
>>> What's the effect of the patch on the testcase in the PR in terms of insns
>>> at
>>> the RTL level? Why doesn't the combiner already optimize it?
>> The patch helps to remove two zero extends from RTL in the test from
>> PR. I believe zee pass was introduced after postreload pass because we
>> should have additional memory instructions by that time and therefore
>> more opportunities for optimization after combiner work.
>>
>> In this particular test case combiner may also help because we have
>> byte memory load and extend on combiner pass. But due to some reason
>> it does not merge them. In combiner dump I see
>>
>> (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ])
>> (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base:
>> ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal}
>> (nil))
>>
>> (insn 43 42 44 4 (parallel [
>> (set (reg:SI 116 [ xr ])
>> (zero_extend:SI (reg/v:QI 81 [ xr ])))
>> (clobber (reg:CC 17 flags))
>> ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and}
>> (expr_list:REG_DEAD (reg/v:QI 81 [ xr ])
>> (expr_list:REG_UNUSED (reg:CC 17 flags)
>> (nil))))
>>
>> and
>>
>> Trying 39 -> 43:
>>
>> With no additional information.
>
> Well, I bet it's because of the CC clobber which is there
> because of the use of TARGET_ZERO_EXTEND_WITH_AND.
> Where does that insn get generated? By combine itself?
This insn is generated by expand:
(insn 43 42 44 6 (parallel [
(set (reg:SI 116)
(zero_extend:SI (reg/v:QI 81 [ xr ])))
(clobber (reg:CC 17 flags))
]) 1.c:11 -1
(nil))
>
> Richard.
>
>>> Enhancing implicit-zee.c to address missed optimizations like the one
>>> reported
>>> in target/50038 might well be the best approach, but the strategy shift
>>> must be
>>> clearly exposed and discussed. The reported numbers are certainly
>>> impressive.
>>>
>>> --
>>> Eric Botcazou
>>>
>>
>> Ilya
>>
>