On Wed, Jul 3, 2024 at 9:26 PM Georg-Johann Lay <a...@gjlay.de> wrote:
Am 02.07.24 um 15:48 schrieb Richard Biener:
On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay <a...@gjlay.de> wrote:
Hi Jeff,
This is a patch to get correct code out of 64-bit
loads from address-space __memx.
The AVR address-spaces may require that move insns issue
calls to library support functions, a fact that -ftree-ter
doesn't account for. tree-ssa-ter.cc then replaces an
expression across such a library call, resulting in wrong code.
This patch disables that pass per default on avr, as there is no
more fine grained way to avoid malicious optimizations.
The pass can still be re-enabled by means of explicit -ftree-ter.
Ok to apply?
I think this requires more details on what goes wrong - I assume
it's not stmt reordering that effectively happens but recursive
expand_expr on SSA defs when those invoke libcalls? In that
case this would point to a deeper issue.
The difference is that with TER, we get a hard reg in .expand
for a movdi from 24-bit address-space __memx.
Such moves require library calls, which in turn require
specific hard registers. As avr backend has no movdi, the
moddi gets expanded as 8 * movqi, and that does not work
when the target registers are hard regs, as some of them
are clobbered by the libcalls.
So I see
(insn 18 17 19 2 (parallel [
(set (reg:QI 22 r22 [+4 ])
(mem/u/c:QI (reg/f:PSI 55) [1 aa+4 S1 A8 AS7]))
(clobber (reg:QI 22 r22))
(clobber (reg:QI 21 r21))
(clobber (reg:HI 30 r30))
]) "t.c":12:13 38 {xloadqi_A}
(nil))
(insn 19 18 20 2 (set (reg:PSI 56)
(reg/f:PSI 47)) "t.c":12:13 112 {*movpsi_split}
(nil))
(insn 20 19 21 2 (parallel [
(set (reg/f:PSI 57)
(plus:PSI (reg/f:PSI 47)
(const_int 5 [0x5])))
(clobber (scratch:QI))
]) "t.c":12:13 205 {addpsi3}
(nil))
(insn 21 20 22 2 (parallel [
(set (reg:QI 23 r23 [+5 ])
(mem/u/c:QI (reg/f:PSI 57) [1 aa+5 S1 A8 AS7]))
(clobber (reg:QI 22 r22))
(clobber (reg:QI 21 r21))
(clobber (reg:HI 30 r30))
]) "t.c":12:13 38 {xloadqi_A}
(nil))
for example - insn 21 clobbers r22 which is also the destination of insn 18.
With -fno-tree-ter those oddly get _no_ intermediate reg but we have
(insn 9 8 10 2 (parallel [
(set (subreg:QI (reg:DI 43 [ aa.0_1 ]) 1)
(mem/u/c:QI (reg/f:PSI 48) [1 aa+1 S1 A8 AS7]))
(clobber (reg:QI 22 r22))
(clobber (reg:QI 21 r21))
(clobber (reg:HI 30 r30))
]) "t.c":12:13 38 {xloadqi_A}
(nil))
but since on GIMPLE we have DImode loads I don't see how TER comes into
play here - TER should favor the second code generation, not the first ...
(or TER shouldn't play into here at all).
with -fno-tree-ter we come via
#0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x7ffff716c9a8,
tmode=E_DImode, modifier=EXPAND_NORMAL, alt_rtl=0x7fffffffcff8,
inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433
#1 0x000000000109fe63 in store_expr (exp=<var_decl 0x7ffff7162000 aa>,
target=0x7ffff716c9a8, call_param_p=0, nontemporal=false, reverse=false)
at /space/rguenther/src/gcc/gcc/expr.cc:6740
#2 0x000000000109e626 in expand_assignment (to=<ssa_name 0x7ffff713f678 1>,
from=<var_decl 0x7ffff7162000 aa>, nontemporal=false)
at /space/rguenther/src/gcc/gcc/expr.cc:6461
while with TER we instead have
#0 expand_expr_real (exp=<var_decl 0x7ffff7162000 aa>, target=0x0,
tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:9433
#1 0x00000000010b279f in expand_expr_real_gassign (g=0x7ffff71613c0,
target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11100
#2 0x00000000010b3294 in expand_expr_real_1 (exp=<ssa_name 0x7ffff713f678 1>,
target=0x0, tmode=E_VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0,
inner_reference_p=false) at /space/rguenther/src/gcc/gcc/expr.cc:11278
the difference is -fno-tree-ter has a target (reg:DI 43 [...]) but with TER we
are not passing a target or a mode.
I think the issue is that the expansion at some point doesn't expect
the result to end up in
a hard register. Maybe define_expand are not supposed to do that but maybe
expansion needs to fix up.
A first thought was
diff --git a/gcc/expr.cc b/gcc/expr.cc
index ffbac513692..1509acad02e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -11111,6 +11111,12 @@ expand_expr_real_gassign (gassign *g, rtx
target, machine_mode tmode,
gcc_unreachable ();
}
set_curr_insn_location (saved_loc);
+ if (!target && REG_P (r) && REGNO (r) < FIRST_PSEUDO_REGISTER)
+ {
+ rtx tem = gen_reg_rtx (GET_MODE (r));
+ emit_move_insn (tem, r);
+ r = tem;
+ }
if (REG_P (r) && !REG_EXPR (r))
set_reg_attrs_for_decl_rtl (lhs, r);
return r;
but of course that's not the place to fix - this sees (mem/u/c:DI
(reg/f:PSI 47) [1 aa+0 S8 A8 AS7])
as result and things go wrong somewhere in the chain of expanding
things from the return, possibly at the point of expanding the plus and
there possibly when building subregs of the DImode mem.
You'd have to trace that down but the fix in the end is to do sth like the above
or alternatively, in the expander producing the 'xload' make sure to not allow
a hardreg as destination when you can still create pseudos?