Am 04.07.24 um 13:25 schrieb Richard Biener:
On Thu, Jul 4, 2024 at 1:08 PM Georg-Johann Lay <a...@gjlay.de> wrote:
Am 04.07.24 um 11:49 schrieb Richard Biener:
On Thu, Jul 4, 2024 at 11:24 AM Richard Biener
<richard.guent...@gmail.com> wrote:
On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <a...@gjlay.de> wrote:
Am 02.07.24 um 15:48 schrieb Richard Biener:
On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <a...@gjlay.de> wrote:

This is a patch to get correct code out of 64-bit
loads from address-space __memx.

The AVR address-spaces may require that move insns issue
calls to library support functions, a fact that -ftree-ter
doesn't account for.  tree-ssa-ter.cc then replaces an
expression across such a library call, resulting in wrong code.

This patch disables that pass per default on avr, as there is no
more fine grained way to avoid malicious optimizations.
The pass can still be re-enabled by means of explicit -ftree-ter.

Ok to apply?

I think this requires more details on what goes wrong - I assume
it's not stmt reordering that effectively happens but recursive
expand_expr on SSA defs when those invoke libcalls?  In that
case this would point to a deeper issue.

The difference is that with TER, we get a hard reg in .expand
for a movdi from 24-bit address-space __memx.

Such moves require library calls, which in turn require
specific hard registers.  As avr backend has no movdi, the
moddi gets expanded as 8 * movqi, and that does not work
when the target registers are hard regs, as some of them
are clobbered by the libcalls.

So I see

(insn 18 17 19 2 (parallel [
              (set (reg:QI 22 r22 [+4 ])
                  (mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7]))
              (clobber (reg:QI 22 r22))
              (clobber (reg:QI 21 r21))
              (clobber (reg:HI 30 r30))
          ]) "t.c":12:13 38 {xloadqi_A}
       (nil))
(insn 19 18 20 2 (set (reg:PSI 56)
          (reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split}
       (nil))
(insn 20 19 21 2 (parallel [
              (set (reg/f:PSI 57)
                  (plus:PSI (reg/f:PSI 47)
                      (const_int 5 [0x5])))
              (clobber (scratch:QI))
          ]) "t.c":12:13 205 {addpsi3}
       (nil))
(insn 21 20 22 2 (parallel [
              (set (reg:QI 23 r23 [+5 ])
                  (mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7]))
              (clobber (reg:QI 22 r22))
              (clobber (reg:QI 21 r21))
              (clobber (reg:HI 30 r30))
          ]) "t.c":12:13 38 {xloadqi_A}
       (nil))

for example - insn 21 clobbers r22 which is also the destination of insn 18.

With -fno-tree-ter those oddly get _no_ intermediate reg but we have

(insn 9 8 10 2 (parallel [
              (set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1)
                  (mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7]))
              (clobber (reg:QI 22 r22))
              (clobber (reg:QI 21 r21))
              (clobber (reg:HI 30 r30))
          ]) "t.c":12:13 38 {xloadqi_A}
       (nil))

but since on GIMPLE we have DImode loads I don't see how TER comes into
play here - TER should favor the second code generation, not the first ...
(or TER shouldn't play into here at all).

with -fno-tree-ter we come via

#0  expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8,
      tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8,
      inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433
#1  0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>,
      target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false)
      at /space/rguenther/src/gcc/gcc/expr.cc:6740
#2  0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>,
      from=<var_decl 0x7ffff7162000 aa>, nontemporal=false)
      at /space/rguenther/src/gcc/gcc/expr.cc:6461

while with TER we instead have

#0  expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0,
      tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
      inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433
#1  0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0,
      target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
      inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100
#2  0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>,
      target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
      inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278

the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we
are not passing a target or a mode.

I think the issue is that the expansion at some point doesn't expect
the result to end up in
a hard register.  Maybe define_expand are not supposed to do that but maybe
expansion needs to fix up.

A first thought was

diff --git a/gcc/expr.cc b/gcc/expr.cc
index ffbac513692..1509acad02e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx
target, machine_mode tmode,
         gcc_unreachable ();
       }
     set_curr_insn_location (saved_loc);
+  if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER)
+    {
+      rtx tem = gen_reg_rtx (GET_MODE (r));
+      emit_move_insn (tem, r);
+      r = tem;
+    }
     if (REG_P (r) && !REG_EXPR (r))
       set_reg_attrs_for_decl_rtl (lhs, r);
     return r;

but of course that's not the place to fix - this sees (mem/u/c:DI
(reg/f:PSI 47) [1 aa+0 S8 A8 AS7])
as result and things go wrong somewhere in the chain of expanding
things from the return, possibly at the point of expanding the plus and
there possibly when building subregs of the DImode mem.

You'd have to trace that down but the fix in the end is to do sth like the above
or alternatively, in the expander producing the 'xload' make sure to not allow
a hardreg as destination when you can still create pseudos?

diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index dabf4c0fc5a..f897113c885 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -746,9 +746,7 @@
           else
             {
               rtx reg_22 = gen_rtx_REG (<MODE>mode, REG_22);
-            if (reg_overlap_mentioned_p (dest2, reg_22)
-                || reg_overlap_mentioned_p (dest2, all_regs_rtx[REG_21]))
-              dest2 = gen_reg_rtx (<MODE>mode);
+            dest2 = gen_reg_rtx (<MODE>mode);

               emit_insn (gen_xload<mode>_A (dest2, src));
             }

seems to fix it.  I'm not sure what reg_overlap_mentioned_p should achieve in
a define-expand.

Richard.

I tried that, but it is still producing wrong code (test case aborts).

If that's so you need to look at where the middle-end comes up with
that hard register as target but still continues to expand other "stuff"
without emitting the hardreg use immediately (the call for what the
hardreg is the argument?).

It's definitely the fault of either the middle-end or the target and has
nothing to do with TER.  I do not have time to track this down further
though.

Found it: It's the emit_move_insn (acc_a, operands[1]) and alike in
avr-dimode.md.  acc_a is a hard reg and operands[1] is general_operand
and thus may be MEM.  I'll prepare a patch to use intermediate pseudo.

Johann


The problem is that the middle-end comes up with a hard
register as destination.  Just using a pseudo register at that
place does not help because you have to copy that pseudo (dest2)
back to the hard register (dest) two lines below at avr.md:757.

Then, in the next such call of gen_movqi, that hard register
lives, and will be clobbered.  Copying back and forth pseudos
and hard regs does not help because the hard reg result
from the previous gen_movqi is still supposed to live across
the next gen_movqi *and* the 2nd gen_movqi is told to put
a result in that exact same hard register.

That cannot work.

Exactly.

For hard regs that are inputs, we could save / restore them across
the xload insns like for PR63633, but that doesn't work for
output operands that are hard regs.

RTL expansion is not supposed to produce defs or uses of hard registers
that are otherwise free to use.  IIRC we disabled TER across hardreg
assignments also because of libcalls (PR70184 which looks similar to
your issue).  It seems that in your case we're expanding the arguments
of a libcall to a libcall which would be the same issue.

I'll note the libcalls only appear very late - after RTL expansion I still see

(insn 44 43 53 2 (set (reg:DI 18 r18)
         (plus:DI (reg:DI 18 r18)
             (reg:DI 10 r10))) "t.c":12:13 2653 {adddi3_insn}
      (nil))

for example but those seem to have hardreg constraints already?

I do wonder how say softfp targets avoid this when doing, say
(a + b) + (d + e)?

Richard.

Johann


Richard.

Moreover, even with TER, the code is no more efficient than
without it, so it's not clear what's the point in propagating
hard regs into expander operands. Later passes like fwprop1 and
combine can do that, too.

Requiring libcalls in a mov insn is quite special indeed,
and there is no way to tell that to TER.  TER itself does not
optimize code involving libcalls, so it knows they are fragile.

So - if the wrongness is already apparent in the RTL expansion
pass dump can you quote the respective pieces and explain why?

It expand a 64-bit move from __memx address-space to registers
R18...R25.  This is broken into 8 QI moves to these regs, but
the movqi requires a libcall in some situations, which pass their
arguments in R21...R25.  Hence the libcalls clobber some of the
destination regs.

It would already help when TER would not propagate hard-regs into
expander operands.

Johann

As an alternative, the option could be disabled permanently in
avr.cc::avr_option_override().

Johann

--

AVR: middle-end/87376 - Use -fno-tree-ter per default.

Temporary expression replacement might replace expressions across
library calls, for example with move insn from address-space __memx
like in PR87376.  -ftree-ter has no way where the backend could hook
in to avoid only problematic replacements, thus kick it out altogether.

           PR middle-end/87376
gcc/
           * common/config/avr/avr-common.cc (avr_option_optimization_table)
           <OPT_ftree_ter>: Set to 0.
gcc/testsuite/
           * gcc.target/avr/torture/pr87376-memx.c: New test.

Reply via email to