https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937
--- Comment #4 from mike.robins at talktalk dot net --- (In reply to Richard Biener from comment #3) > (In reply to mike.robins from comment #2) > > (In reply to Richard Biener from comment #1) > > > You need to adjust RTX costing accordingly which likely means adding a new > > > subtarget tuning. > > > > Hi Richard > > Are you saying that this would have to be added at the GCC source level > > somehow. I.e that there is no existing -mtune... or -f... to achieve this? > > Mike > > Generally yes. I don't know the arm backend enough to tell whether there > exists an ARM variant with the multiplier behaving in this way (I suppose an > in-order, > non-pipelined m0 core might behave this way ...) It appears that other compilers default to the fast multiplier implementation, using a "small-multiply" option to tune for a smaller silicon, slower version. See https://community.nxp.com/t5/LPCXpresso-IDE-FAQs/Use-of-Cortex-M0-M0-multiply-instructions-on-LPC43xx-and/m-p/461571 and the -mtune section in https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html. Is it possible that the GCC default is somehow to use the small/slow multiply whereas it should default to the large/fast one if the small/slow version isn't specified?