On Thu, Jul 20, 2023 at 11:16 AM Maciej W. Rozycki <ma...@embecosm.com> wrote: > > On Thu, 20 Jul 2023, Richard Biener wrote: > > > > Thanks for making this improvement. I've checked MIPS results and code > > > produced now is as follows: > > > > > > daddiu $sp,$sp,-64 > > > sd $5,24($sp) > > > sd $7,40($sp) > > > ldc1 $f0,24($sp) > > > ldc1 $f1,40($sp) > > > sd $4,16($sp) > > > sd $6,32($sp) > > > ldc1 $f2,32($sp) > > > add.ps $f1,$f0,$f1 > > > ldc1 $f0,16($sp) > > > add.ps $f0,$f0,$f2 > > > sdc1 $f1,56($sp) > > > ld $3,56($sp) > > > sdc1 $f0,48($sp) > > > ld $2,48($sp) > > > jr $31 > > > daddiu $sp,$sp,64 > > > > > > which does do vector stuff now, although it's still considerably worse > > > than my handwritten example: > > > > > > > > dmtc1 $4,$f0 > > > > > dmtc1 $5,$f1 > > > > > dmtc1 $6,$f2 > > > > > dmtc1 $7,$f3 > > > > > add.ps $f0,$f0,$f1 > > > > > add.ps $f2,$f2,$f3 > > > > > dmfc1 $2,$f0 > > > > > jr $31 > > > > > dmfc1 $3,$f2 > > > > > > Or I'd say it's pretty terrible, but given the current situation with the > > > MIPS backend I'm going to leave it to the new maintainer to sort out. > > > > Yeah, I also wondered what is wrong ... I suspect it's the usual issue > > of parameter passing causing spilling ... > > There's no such requirement in the psABI and I fail to see a plausible > justification. And direct GPR<->FPR move patterns are available in the > backend for the V2SF mode. Also there's no delay slot requirement even > for these move instructions for MIPS64r1+ ISA levels, which have this > paired-single FP format defined. It seems to me a plain bug (or missed > optimisation if you prefer).
Definitely. OTOH parameter/return passing for V4SFmode while appearantly being done in registers the backend(?) assigns BLKmode to the V4SFmode arguments so they get immediately spilled in the code moving the incoming hardregisters to pseudos (or stack as in this case). It comes down to the issue that Jiufu Guo is eventually addressing with adding SRA-style heuristics to the code chosing the layout of that storage. Interestingly for the return value we get TImode. Note we don't seem to be able to optimize (insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 78 $frame) (const_int 24 [0x18])) [1 a+8 S8 A64]) (reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit} (expr_list:REG_DEAD (reg:DI 5 $5) (nil))) ... (insn 40 7 41 2 (set (reg:V2SF 205 [ a+8 ]) (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame) (const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387 {*movv2sf} (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame) (const_int 24 [0x18])) [1 a+8 S8 A64]) (nil))) for some reason. Maybe we are afraid of the hardreg use in the store, maybe it is because the store is in the prologue (before NOTE_INSN_FUNCTION_BEG). Also postreload isn't able to fix this: (insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 29 $sp) (const_int 24 [0x18])) [1 a+8 S8 A64]) (reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit} (nil)) ... (insn 40 7 41 2 (set (reg:V2SF 32 $f0 [orig:205 a+8 ] [205]) (mem/c:V2SF (plus:DI (reg/f:DI 29 $sp) (const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387 {*movv2sf} (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame) (const_int 24 [0x18])) [1 a+8 S8 A64]) (nil))) so something is amiss in the backend as well if you say there should be direct moves available. Richard. > > Maciej