2011/11/6 Richard Guenther <richard.guent...@gmail.com>: > On Sun, Nov 6, 2011 at 11:50 AM, Ilya Enkovich <enkovich....@gmail.com> wrote: >> Hello, >> >> 2011/11/5 Eric Botcazou <ebotca...@adacore.com>: >>>> Here is a patch which fixes redundant zero extensions problem. Issue >>>> is resolved by expanding implicit_zee pass functionality to cover zero >>>> and sign extends of different modes. Could please someone review it? >>> >>> Could you explain the undelying idea? The current strategy of >>> implicit-zee.c >>> is exposed at length at the beginning of the file, but here's a summary: >>> >>> 1. On some architectures (typically x86-64), implicity zero-extensions are >>> applied when instructions operate in selected sub-word modes (SImode here): >>> >>> addl edi,eax >>> >>> has an implicit zero-extension for %rax. >>> >>> 2. Because of 1, the second instruction in sequences like: >>> >>> (set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2))) >>> (set (reg:DI x) (zero_extend:DI (reg:SI x))) >>> >>> is redundant. >>> >>> 3. The pass recognizes this and transforms the above sequence into: >>> >>> (set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2)))) >>> >>> and the machine description knows how to translate this into an 'addl'. >>> >>> >>> You're proposing extending this to other modes and other architectures, for >>> example QImode on x86. But does >>> >>> addb %dl, %al >>> >>> modify the entire %eax register on x86? In other words, are you really >>> after >>> implicit (zero-)extensions or after something else, like global elimination >>> of >>> redundant extensions? >> Initial aim of the pass was to remove zero extentions redundant due to >> implicit zero extention in x64. But implementation actually uses >> generic approach and seems like a mini-combiner. Pass may combine two >> zero extends or combine zero extend with a constant as a special case >> but in other cases we just try to merge two instructions and then >> check we have corresponding template. It can be easily adopted to >> remove all redundant extensions. So, byte add in the example will be >> merged with zxero extend only if we have explicit template for it in >> machine model. >> >>> >>> What's the effect of the patch on the testcase in the PR in terms of insns >>> at >>> the RTL level? Why doesn't the combiner already optimize it? >> The patch helps to remove two zero extends from RTL in the test from >> PR. I believe zee pass was introduced after postreload pass because we >> should have additional memory instructions by that time and therefore >> more opportunities for optimization after combiner work. >> >> In this particular test case combiner may also help because we have >> byte memory load and extend on combiner pass. But due to some reason >> it does not merge them. In combiner dump I see >> >> (insn 39 38 40 4 (set (reg/v:QI 81 [ xr ]) >> (mem:QI (reg/v/f:DI 111 [ ImageInPtr ]) [0 MEM[base: >> ImageInPtr_29, offset: 0B]+0 S1 A8])) 1.c:9 66 {*movqi_internal} >> (nil)) >> >> (insn 43 42 44 4 (parallel [ >> (set (reg:SI 116 [ xr ]) >> (zero_extend:SI (reg/v:QI 81 [ xr ]))) >> (clobber (reg:CC 17 flags)) >> ]) 1.c:11 121 {*zero_extendqisi2_movzbl_and} >> (expr_list:REG_DEAD (reg/v:QI 81 [ xr ]) >> (expr_list:REG_UNUSED (reg:CC 17 flags) >> (nil)))) >> >> and >> >> Trying 39 -> 43: >> >> With no additional information. > > Well, I bet it's because of the CC clobber which is there > because of the use of TARGET_ZERO_EXTEND_WITH_AND. > Where does that insn get generated? By combine itself?
This insn is generated by expand: (insn 43 42 44 6 (parallel [ (set (reg:SI 116) (zero_extend:SI (reg/v:QI 81 [ xr ]))) (clobber (reg:CC 17 flags)) ]) 1.c:11 -1 (nil)) > > Richard. > >>> Enhancing implicit-zee.c to address missed optimizations like the one >>> reported >>> in target/50038 might well be the best approach, but the strategy shift >>> must be >>> clearly exposed and discussed. The reported numbers are certainly >>> impressive. >>> >>> -- >>> Eric Botcazou >>> >> >> Ilya >> >