> Can we backport the patch(at least the generic part) to
> GCC11/GCC12/GCC13 release branch?
Yes, the periodic testers has took the change and as far as I can tell,
there are no surprises.
Thanks,
Honza
> > > >
> > > > /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight
> > > > 51
On Thu, Dec 14, 2023 at 12:03 AM Jan Hubicka wrote:
>
> > > The diffrerence is that Cores understand the fact that fmadd does not need
> > > all three parameters to start computation, while Zen cores doesn't.
> > >
> > > Since this seems noticeable win on zen and not loss on Core it seems like
>
> > The diffrerence is that Cores understand the fact that fmadd does not need
> > all three parameters to start computation, while Zen cores doesn't.
> >
> > Since this seems noticeable win on zen and not loss on Core it seems like
> > good
> > default for generic.
> >
> > I plan to commit the pa
On Tue, Dec 12, 2023 at 10:38 PM Jan Hubicka wrote:
>
> Hi,
> this patch disables use of FMA in matrix multiplication loop for generic (for
> x86-64-v3) and zen4. I tested this on zen4 and Xenon Gold Gold 6212U.
>
> For Intel this is neutral both on the matrix multiplication microbenchmark
> (att
On Tue, 12 Dec 2023, Richard Biener wrote:
> On Tue, Dec 12, 2023 at 3:38 PM Jan Hubicka wrote:
> >
> > Hi,
> > this patch disables use of FMA in matrix multiplication loop for generic
> > (for
> > x86-64-v3) and zen4. I tested this on zen4 and Xenon Gold Gold 6212U.
> >
> > For Intel this is
>
> This came up in a separate thread as well, but when doing reassoc of a
> chain with
> multiple dependent FMAs.
>
> I can't understand how this uarch detail can affect performance when
> as in the testcase
> the longest input latency is on the multiplication from a memory load.
> Do we actuall
On Tue, Dec 12, 2023 at 3:38 PM Jan Hubicka wrote:
>
> Hi,
> this patch disables use of FMA in matrix multiplication loop for generic (for
> x86-64-v3) and zen4. I tested this on zen4 and Xenon Gold Gold 6212U.
>
> For Intel this is neutral both on the matrix multiplication microbenchmark
> (atta