On Thu, 8 Apr 2021, Maciej W. Rozycki wrote: > > Does the patch also help to optimise my example above? If so, > > it sounds like a good thing for that reason alone. > > Nope, it actually regresses code produced, causing an extra instruction > to be used where it doesn't have to: > > --- test-ira/test.s > +++ test-lra-legitimate-address-ashift/test.s > @@ -8,7 +8,8 @@ > .word 0 > subl2 $4,%sp > calls $0,foo > - addl3 8(%ap),$3,%r1 > + movl 8(%ap),%r1 > + addl2 $12,%r0 > moval 0[%r1],%r1 > addl2 %r1,%r0 > ret > > (old code is either IRA, or LRA without my change to accept ASHIFT in > TARGET_LEGITIMATE_ADDRESS_P). Please observe that the original ADDL3 > instruction completes the index calculation (as desired, as noted above), > and then the rest boils down to combining it with the base component as > handed over by `foo'. > > NB (%reg) is a base register and [%reg] is an index register, implicitly > scaled by the data width determined by the relevant suffix of the mnemonic > (`l' here stands for "longword", so 32-bit). And MOVA is "move address" > like x86's LEA. > > Though frankly what we really want anyway is: > > calls $0,foo > addl3 8(%ap),$3,%r1 > moval (%r0)[%r1],%r0 > ret > > because what we need to do here to complete the calculation is to combine > the base in %r0 with the index in %r1 -- there's no performance penalty > for the use of a base register (an index costs an extra cycle) and the > encoding is much shorter: > > 11: de 41 9f 00 moval *0x0[r1],r1 > 15: 00 00 00 51
FAOD mind that MOVAL produced here does not come from indexed addressing but rather from the ASHIFT pattern, as also shown by the original insn: (insn 23 22 24 2 (set (reg:SI 1 %r1 [36]) (ashift:SI (reg/v:SI 1 %r1 [orig:29 x ] [29]) (const_int 2 [0x2]))) "test.c":2:56 432 {ashlsi3} (nil)) I quoted in last message, or in the past-reload form: (insn 27 26 28 (parallel [ (set (reg:SI 1 %r1 [36]) (ashift:SI (reg/v:SI 1 %r1 [orig:29 x ] [29]) (const_int 2 [0x2]))) (clobber (reg:CC 16 %psl)) ]) "test.c":2:56 433 {*ashlsi3} (expr_list:REG_UNUSED (reg:CC 16 %psl) (nil))) > 19: c0 51 50 addl2 r1,r0 > > vs: > > 11: de 41 60 50 moval (r0)[r1],r0 > > (because in the absence of a base register the absolute address mode has > to be used, which always uses full 32-bit extension, unlike displacements > optionally used with a base register). But while we're not there, we're > moving away from rather than towards the desired solution. This seems to > be an unfortunate consequence of how reload works though, so I gather it > needs to be fixed in reload. So I thought of another experiment and I applied my ASHIFT patch to the old reload configuration. In its unmodified form it caused an ICE with your example: during RTL pass: final test.c: In function 'bar': test.c:2:56: internal compiler error: in print_operand_address, at config/vax/vax.c:419 2 | long *bar (long *ptr, long x) { return &foo ()[x + 3]; } | ^ 0x1188489f print_operand_address(_IO_FILE*, rtx_def*) .../gcc/config/vax/vax.c:419 0x111acbbb default_print_operand_address(_IO_FILE*, machine_mode, rtx_def*) .../gcc/targhooks.c:367 0x10b3115f output_address(machine_mode, rtx_def*) .../gcc/final.c:4085 0x10b3088b output_asm_insn(char const*, rtx_def**) .../gcc/final.c:3942 0x10b2ea63 final_scan_insn_1 .../gcc/final.c:3125 0x10b2eda7 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*) .../gcc/final.c:3171 0x10b2bd1f final_1 .../gcc/final.c:2022 0x10b327e3 rest_of_handle_final .../gcc/final.c:4676 0x10b32d23 execute .../gcc/final.c:4754 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. coming from: (plus:SI (ashift:SI (reg/v:SI 1 %r1 [orig:29 x ] [29]) (const_int 2 [0x2])) (reg:SI 0 %r0 [33])) and serving as a proof that backends used not to expect ASHIFT in addresses. Anyway I went ahead and converted `print_operand_address' too, yielding this code: calls $0,foo movl 8(%ap),%r1 moval 12(%r0)[%r1],%r0 ret or: c: d0 ac 08 51 movl 0x8(ap),r1 10: de 41 a0 0c moval 0xc(r0)[r1],r0 14: 50 15: 04 ret which does use indexed addressing: (insn 24 23 17 (parallel [ (set (reg/i:SI 0 %r0) (plus:SI (plus:SI (ashift:SI (reg/v:SI 1 %r1 [orig:29 x ] [29]) (const_int 2 [0x2])) (reg:SI 0 %r0 [33])) (const_int 12 [0xc]))) (clobber (reg:CC 16 %psl)) ]) "test.c":2:56 623 {*movaddrsi} (expr_list:REG_DEAD (reg/v:SI 1 %r1 [orig:29 x ] [29]) (expr_list:REG_UNUSED (reg:CC 16 %psl) (nil)))) and given that the left-shift does not cause the displacement to require a wider encoding here it is just as good as the desired sequence I came up with above (here with ADDL3/RET I omitted above): c: c1 ac 08 03 addl3 0x8(ap),$0x3,r1 10: 51 11: de 41 60 50 moval (r0)[r1],r0 15: 04 ret so we're clearly doing something considerably worse with LRA than we used to with old reload. > > Yeah, canonicalisation is supposed to simplify handling. I think the > > thing that thwarts that in this particular context is the fact that > > we have different rules for “mult”s inside addresses. IMO that was > > a mistake, and “mult” by a power of 2 should be an “ashift” > > wherever it occurs. But fixing that for all backends would probably > > be a huge task. > > Worth doing anyway perhaps? How many targets are there that we support > that have scaled indexed addressing? So with the updates to `print_operand_address' and `vax_address_cost_1' as well I have the VAX backend prepared for handling either representation throughout. I can't imagine it being *that* horribly complicated for the other backends, and once all are handled we can convert the middle end. Maciej