https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116719
Bug ID: 116719
Summary: [SH] missed folding of fp factor constant
Product: gcc
Version: 14.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: olegendo at gcc dot gnu.org
Target Milestone: ---
The following code, compiled on sh-elf with -std=c11 -O2 -ml -m4-single
-ffast-math -mfsrra -mfsca
void fisincosr(float angle, float *s, float *c) {
float __r = angle / 10430.37835f;
*s = __builtin_sinf(__r);
*c = __builtin_cosf(__r);
}
void fisincos(float angle, float *s, float *c) {
fisincosr(angle * 182.04444443f, s, c);
}
produces:
_fisincosr:
ftrc fr5,fpul
fsca fpul,dr2
fmov.s fr2,@r4
rts
fmov.s fr3,@r5
_fisincos:
mova .L4,r0
fmov.s @r0+,fr1
fmul fr1,fr5
fmov.s @r0+,fr1
fmul fr1,fr5
ftrc fr5,fpul
fsca fpul,dr2
fmov.s fr2,@r4
rts
fmov.s fr3,@r5
.L4:
.long 1016003126
.long 1176697219
In the second function 'fsincos' it fails to combine the factor 182.04444443
with the other factor 10430.37835 which is emitted by the 'sincossf3' pattern.
At some point, combine is trying to combine the two sf constants and emit a
multiplication insn:
Trying 8, 10, 9 -> 11:
8: {r168:SF=1.74532942473888397216796875e-2;use fpscr0:SI;clobber scratch;}
10: {r174:SF=1.0430378350470453e+4;use fpscr0:SI;clobber scratch;}
9: {r162:SF=r181:SF*r168:SF;clobber fpscr1:SI;use fpscr0:SI;}
REG_DEAD r181:SF
REG_UNUSED fpscr1:SI
REG_DEAD r168:SF
11: {r171:SF=r162:SF*r174:SF;clobber fpscr1:SI;use fpscr0:SI;}
REG_DEAD r174:SF
REG_DEAD r162:SF
REG_UNUSED fpscr1:SI
REG_EQUAL r162:SF*1.0430378350470453e+4
Failed to match this instruction:
(parallel [
(set (reg:SF 171)
(mult:SF (reg:SF 181)
(const_double:SF 1.820444488525390625e+2 [0x0.b60b61p+8])))
(clobber (reg:SI 155 fpscr1))
(use (reg:SI 154 fpscr0))
])
... but it fails because SH doesn't have any fp insns that can have constants
as operands, only registers. It would require splitting the insn and emit a
constant load, just like the expanders do. But that's not done during combine.
Such patterns could be added, but then it'd be good to hoist the constant loads
out of loops. See also PR 54089 and its
https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543, which is solving the
problem of constants being formed later during RTL optimization.
Another option could be to extend the combine pass to do this automatically,
but it would probably have bigger repercussions also on other targets.