On Thu, 8 Apr 2021, Maciej W. Rozycki wrote:

> > Does the patch also help to optimise my example above?  If so,
> > it sounds like a good thing for that reason alone.
> 
>  Nope, it actually regresses code produced, causing an extra instruction 
> to be used where it doesn't have to:
> 
> --- test-ira/test.s
> +++ test-lra-legitimate-address-ashift/test.s
> @@ -8,7 +8,8 @@
>       .word 0
>       subl2 $4,%sp
>       calls $0,foo
> -     addl3 8(%ap),$3,%r1
> +     movl 8(%ap),%r1
> +     addl2 $12,%r0
>       moval 0[%r1],%r1
>       addl2 %r1,%r0
>       ret
> 
> (old code is either IRA, or LRA without my change to accept ASHIFT in 
> TARGET_LEGITIMATE_ADDRESS_P).  Please observe that the original ADDL3 
> instruction completes the index calculation (as desired, as noted above), 
> and then the rest boils down to combining it with the base component as 
> handed over by `foo'.
> 
>  NB (%reg) is a base register and [%reg] is an index register, implicitly 
> scaled by the data width determined by the relevant suffix of the mnemonic 
> (`l' here stands for "longword", so 32-bit).  And MOVA is "move address" 
> like x86's LEA.
> 
>  Though frankly what we really want anyway is:
> 
>       calls $0,foo
>       addl3 8(%ap),$3,%r1
>       moval (%r0)[%r1],%r0
>       ret
> 
> because what we need to do here to complete the calculation is to combine 
> the base in %r0 with the index in %r1 -- there's no performance penalty 
> for the use of a base register (an index costs an extra cycle) and the 
> encoding is much shorter:
> 
>   11: de 41 9f 00     moval *0x0[r1],r1
>   15: 00 00 00 51 

 FAOD mind that MOVAL produced here does not come from indexed addressing 
but rather from the ASHIFT pattern, as also shown by the original insn:

(insn 23 22 24 2 (set (reg:SI 1 %r1 [36])
        (ashift:SI (reg/v:SI 1 %r1 [orig:29 x ] [29])
            (const_int 2 [0x2]))) "test.c":2:56 432 {ashlsi3}
     (nil))

I quoted in last message, or in the past-reload form:

(insn 27 26 28 (parallel [
            (set (reg:SI 1 %r1 [36])
                (ashift:SI (reg/v:SI 1 %r1 [orig:29 x ] [29])
                    (const_int 2 [0x2])))
            (clobber (reg:CC 16 %psl))
        ]) "test.c":2:56 433 {*ashlsi3}
     (expr_list:REG_UNUSED (reg:CC 16 %psl)
        (nil)))

>   19: c0 51 50        addl2 r1,r0
> 
> vs:
> 
>   11: de 41 60 50     moval (r0)[r1],r0
> 
> (because in the absence of a base register the absolute address mode has 
> to be used, which always uses full 32-bit extension, unlike displacements 
> optionally used with a base register).  But while we're not there, we're 
> moving away from rather than towards the desired solution.  This seems to 
> be an unfortunate consequence of how reload works though, so I gather it 
> needs to be fixed in reload.

 So I thought of another experiment and I applied my ASHIFT patch to the 
old reload configuration.  In its unmodified form it caused an ICE with 
your example:

during RTL pass: final
test.c: In function 'bar':
test.c:2:56: internal compiler error: in print_operand_address, at 
config/vax/vax.c:419
    2 | long *bar (long *ptr, long x) { return &foo ()[x + 3]; }
      |                                                        ^
0x1188489f print_operand_address(_IO_FILE*, rtx_def*)
        .../gcc/config/vax/vax.c:419
0x111acbbb default_print_operand_address(_IO_FILE*, machine_mode, rtx_def*)
        .../gcc/targhooks.c:367
0x10b3115f output_address(machine_mode, rtx_def*)
        .../gcc/final.c:4085
0x10b3088b output_asm_insn(char const*, rtx_def**)
        .../gcc/final.c:3942
0x10b2ea63 final_scan_insn_1
        .../gcc/final.c:3125
0x10b2eda7 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
        .../gcc/final.c:3171
0x10b2bd1f final_1
        .../gcc/final.c:2022
0x10b327e3 rest_of_handle_final
        .../gcc/final.c:4676
0x10b32d23 execute
        .../gcc/final.c:4754
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

coming from:

(plus:SI (ashift:SI (reg/v:SI 1 %r1 [orig:29 x ] [29])
        (const_int 2 [0x2]))
    (reg:SI 0 %r0 [33]))

and serving as a proof that backends used not to expect ASHIFT in 
addresses.  Anyway I went ahead and converted `print_operand_address' too, 
yielding this code:

        calls $0,foo
        movl 8(%ap),%r1
        moval 12(%r0)[%r1],%r0
        ret

or:

   c:   d0 ac 08 51     movl 0x8(ap),r1
  10:   de 41 a0 0c     moval 0xc(r0)[r1],r0
  14:   50 
  15:   04              ret

which does use indexed addressing:

(insn 24 23 17 (parallel [
            (set (reg/i:SI 0 %r0)
                (plus:SI (plus:SI (ashift:SI (reg/v:SI 1 %r1 [orig:29 x ] [29])
                            (const_int 2 [0x2]))
                        (reg:SI 0 %r0 [33]))
                    (const_int 12 [0xc])))
            (clobber (reg:CC 16 %psl))
        ]) "test.c":2:56 623 {*movaddrsi}
     (expr_list:REG_DEAD (reg/v:SI 1 %r1 [orig:29 x ] [29])
        (expr_list:REG_UNUSED (reg:CC 16 %psl)
            (nil))))

and given that the left-shift does not cause the displacement to require a 
wider encoding here it is just as good as the desired sequence I came up 
with above (here with ADDL3/RET I omitted above):

   c:   c1 ac 08 03     addl3 0x8(ap),$0x3,r1
  10:   51 
  11:   de 41 60 50     moval (r0)[r1],r0
  15:   04              ret

so we're clearly doing something considerably worse with LRA than we used 
to with old reload.

> > Yeah, canonicalisation is supposed to simplify handling.  I think the
> > thing that thwarts that in this particular context is the fact that
> > we have different rules for “mult”s inside addresses.  IMO that was
> > a mistake, and “mult” by a power of 2 should be an “ashift”
> > wherever it occurs.  But fixing that for all backends would probably
> > be a huge task.
> 
>  Worth doing anyway perhaps?  How many targets are there that we support 
> that have scaled indexed addressing?

 So with the updates to `print_operand_address' and `vax_address_cost_1' 
as well I have the VAX backend prepared for handling either representation 
throughout.  I can't imagine it being *that* horribly complicated for the 
other backends, and once all are handled we can convert the middle end.

  Maciej

Reply via email to