On Sun, 7 Oct 2012, Richard Sandiford wrote: > > So I think this can't really be selected automatically for all cores, > > some human-supplied knowledge about the MD unit used is required -- that > > obviously affects other operations too, e.g. some multiplications > > involving a constant that may be cheaper to do either directly or with a > > sequence of additions depending on the MD unit present (unless optimising > > for size, of course). > > Yeah, as far as this optimisation goes, I think your original suggestion > of using the DFA to work out the best sequence is still right. If the DFA > says that multiplications take a handful of cycles when they actually take > 32 then we're not going to optimise multiplication very well whatever happens. > > In practice, code destined for non-4kc cores with iterative multipliers > would probably tune well with -mtune=4kp.
Hmm, maybe, although I find requiring people to tune for an obsolete MIPS32 rev. 1 processor to get the right MDU performance figures for modern processors a bit odd (the 4Kp DFA undoubtedly lacks reservations for newer instructions introduced with the MIPS32r2 architecture update). I've thought of -miterative-mdu or suchlike a knob that would override the default cost of multiplication/division as appropriate (i.e. to 32/64 plus any extra operation-specific constant as required), perhaps by forcing the 4Kp insn reservation (extended to 64 bits as required) over the usual one; of course there would be nothing to override the -mtune=4Kp arrangement with. Nothing urgent though, just an idea to ponder. > The final patch ended up being much more complicated than I'd have liked, > but at least it should be easier to add other automatically-derived tuning > costs in future. Thanks for taking my concerns into account. One nit below. > Index: gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c > =================================================================== > --- /dev/null 2012-10-06 09:26:21.400998949 +0100 > +++ gcc/testsuite/gcc.target/mips/mips32-dsp-accinit-2.c 2012-10-06 > 22:24:24.857713316 +0100 > @@ -0,0 +1,22 @@ > +/* { dg-options "-mdspr2 -mgp32 -mtune=4kp" } */ > +/* References to RESULT within the loop need to have a higher frequency than > + references to RESULT outside the loop, otherwise there is no reason > + to prefer multiply/accumulator registers over GPRs. */ > +/* { dg-skip-if "requires register frequencies" { *-*-* } { "-O0" "-Os" } { > "" } } */ > + > +/* Check that the zero-initialization of the accumulator feeding into > + the madd is done by means of a mult instruction instead of mthi/mtlo. */ This comment looks reversed to me as in this case we do NOT want a MULT to be used... > + > +NOMIPS16 long long f (int n, int *v, int m) > +{ > + long long result = 0; > + int i; > + > + for (i = 0; i < n; i++) > + result = __builtin_mips_madd (result, v[i], m); > + return result; > +} > + > +/* { dg-final { scan-assembler-not "mult\t\[^\n\]*\\\$0" } } */ > +/* { dg-final { scan-assembler "\tmthi\t" } } */ > +/* { dg-final { scan-assembler "\tmtlo\t" } } */ ... and in fact you do check that we didn't get one. Maciej