[Bug target/99937] Optimization needed for ARM with single cycle multiplier

mike.robins at talktalk dot net via Gcc-bugs Wed, 07 Apr 2021 09:16:14 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937


--- Comment #4 from mike.robins at talktalk dot net ---
(In reply to Richard Biener from comment #3)
> (In reply to mike.robins from comment #2)
> > (In reply to Richard Biener from comment #1)
> > > You need to adjust RTX costing accordingly which likely means adding a new
> > > subtarget tuning.
> > 
> > Hi Richard
> > Are you saying that this would have to be added at the GCC source level
> > somehow. I.e that there is no existing -mtune... or -f... to achieve this?
> > Mike
> 
> Generally yes.  I don't know the arm backend enough to tell whether there
> exists an ARM variant with the multiplier behaving in this way (I suppose an
> in-order,
> non-pipelined m0 core might behave this way ...)

It appears that other compilers default to the fast multiplier implementation,
using a "small-multiply" option to tune for a smaller silicon, slower version.
See
https://community.nxp.com/t5/LPCXpresso-IDE-FAQs/Use-of-Cortex-M0-M0-multiply-instructions-on-LPC43xx-and/m-p/461571
and the -mtune section in https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html.
Is it possible that the GCC default is somehow to use the small/slow multiply
whereas it should default to the large/fast one if the small/slow version isn't
specified?

[Bug target/99937] Optimization needed for ARM with single cycle multiplier

Reply via email to