https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123086
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2026-01-02
--- Comment #2 from Jeffrey A. Law <law at gcc dot gnu.org> ---
No, it's not scheduling at all.
THe relevent gimple from the .expand dump:
;; basic block 7, loop depth 3
;; pred: 7
;; 6
# iter_62 = PHI <iter_35(7), 0(6)>
# zx_64 = PHI <zxS_31(7), 0.0(6)>
# zy_65 = PHI <zy_32(7), 0.0(6)>
# zxS_66 = PHI <zxS_33(7), 0.0(6)>
# zyS_67 = PHI <zyS_34(7), 0.0(6)>
_8 = zxS_66 - zyS_67;
zx_10 = zx_64;
zxS_31 = _8 + cx_28;
_9 = zx_10 * 2.0e+0;
zy_32 = .FMA (_9, zy_65, cy_29);
zxS_33 = zxS_31 * zxS_31;
zyS_34 = zy_32 * zy_32;
iter_35 = iter_62 + 1;
_11 = zxS_33 + zyS_34;
_42 = maxIter_30(D) > iter_35;
_41 = _11 <= 4.0e+0;
_39 = _41 & _42;
if (_39 != 0)
goto <bb 7>; [89.30%]
else
goto <bb 8>; [10.70%]
Note the copy from zx_64 to zx_10. THat doesn't exist in the .optimized dump.
So it would likely be useful to explore where that copy came from. WHen we
expand to RTL we get:
;; Generating RTL for gimple basic block 7
;; _8 = zxS_66 - zyS_67;
(insn 47 46 0 (set (reg:SF 141 [ _8 ])
(minus:SF (reg/v:SF 150 [ zxS ])
(reg/v:SF 151 [ zyS ]))) "j.c":13:14 -1
(nil))
;; zx_10 = zx_64;
(insn 48 47 0 (set (reg/v:SF 143 [ zx ])
(reg/v:SF 148 [ zxS ])) -1
(nil))
;; zxS_31 = _8 + cx_28;
(insn 49 48 0 (set (reg/v:SF 148 [ zxS ])
(plus:SF (reg:SF 141 [ _8 ])
(reg/v:SF 146 [ cx ]))) "j.c":13:8 -1
(nil))
;; zy_32 = .FMA (_9, zy_65, cy_29);
(insn 50 49 51 (set (reg:SF 181 [ _9 ])
(plus:SF (reg/v:SF 143 [ zx ])
(reg/v:SF 143 [ zx ]))) "j.c":14:11 -1
(nil))
That's just the relevant snippet. There's really not much the RTL optimizers
can do with that. We can't propagate away the copy in insn 48 because the
source of the copy changes in insn 49. That's unchanged through register
allocation.
Other models do move things around. In particular insn 49 bubbles down past
the other uses of (reg 143) in insn 50. That in turn allows IRA to tie (reg
143) and (reg 148) and the copy goes away. Other models are pushing insn 49
down to get it further away from the initial subtraction which in the generic
model has a 5c latency.
The spacemit model shows a 4c latency for the initial subtraction which matches
LLVM and matches our best understanding of how that design behaves. If I hack
up the spacemit model to show a 5c latency, then the extraneous move goes away,
but that's likely to cause regressions elsewhere.
Overall things are behaving sensibly in the scheduler. The place to look to
improve things is the introduction of that copy insn immediately before RTL.
But I don't expecting changing the spacemit scheduling model so that it avoids
this copy to be the right thing to do and it would likely be generally
unprofitable.