Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

Richard Biener Mon, 30 Jun 2025 04:30:28 -0700

On Mon, Jun 30, 2025 at 9:54 AM Richard Biener
<[email protected]> wrote:
>
> On Fri, Jun 27, 2025 at 3:45 PM Alexander Monakov <[email protected]> wrote:
> >
> >
> > On Fri, 27 Jun 2025, Richard Biener wrote:
> >
> > > > I am testing partial fixes for these issues.
> > >
> > > Can you check again after r16-1731-g08bdb6b4a32f1f?
> >
> > Certainly.  There's no more fmaddsub-related testsuite failures, and only
> > three tests need adjustment on x86.
> >
> > * gcc.target/i386/pr116979.c
> >
> > Here we check that FMA is used for complex multiplication. This is a case
> > where use of FMA is detrimental for accuracy, but I guess we can choose
> > to preserve the status quo. I can look into teaching
> > expand_complex_multiplication to emit FMAs.
>
> Yeah, I guess that's within -ffp-contract=on constraints.  When you say it's
> detrimental for accuracy, is that a general statement for complex multiply
> lowering?  If so we possibly do not want FMAs?
>
> > gcc.target/i386/vect-pr82426.c
> >
> > Here we check that 2x2 matrix multiplication ends up using a specific mix
> > of plain mul/adds and FMAs, and SLP is having difficulties. Generally
> > we are doing poorer job compared to LLVM here, either with or without
> > early FMAs. Do you want a separate bugreport for this?
>
> Yeah, it seems to be a lack of operand swizzling support for ternary
> commutative ops and for swapping ops of calls.  So yes, please open
> a separate bugreport for this.  I'll see what I can do there.


I have fixed this and filed two remaining corner cases not handled
as PR120885 and PR120886, but I think both are unrelated to the
task at hand.

So no need to open a bugreport anymore.

> > gcc.target/i386/intrinsics_opt-1.c
> >
> > Here we check that SSE addss and mulss intrinsics are combined into an
> > FMA. Clearly a case where unrestricted contraction needs to be enabled
> > explicitly via -ffp-contract=fast or such (but also, I'd say it is a
> > questionable transform considering the source had intrinsics).
>
> Yep, I'd expect there are some cases where we would need to add
> -ffp-contract=fast
> when we switch the default.

It might also seem we'd lose coverage for -ffp-contract=fast in some cases.
I'm unsure what to do here besides looking for the fallout of -ffp-contract=off
and duplicating all testcases to run with both -ffp-contract=on and
-ffp-contract=fast?

Not the first time I wish we'd have sth like

/* { dg-torture { { -ffp-contract=fast } { -ffp-contract=on } } } */

and possibly dg-additional-torture to amend what's done with dg-torture.exp,
dg-torture there might produce a cross-product of both torture sets.

Richard.

>
> Thanks,
> Richard.
>
> > --- 8< ---
> >
> > From acf83683b03653e3317a30a549cecd3bde49a325 Mon Sep 17 00:00:00 2001
> > From: Alexander Monakov <[email protected]>
> > Date: Mon, 12 May 2025 23:23:42 +0300
> > Subject: [PATCH] c-family: switch away from -ffp-contract=fast
> >
> > Unrestricted FMA contraction, combined with optimizations that create
> > copies of expressions, is causing hard-to-debug issues such as PR106902.
> > Since we implement conformant contraction now, switch C and C++ from
> > -ffp-contract=fast to either =off (under -std=c[++]NN like before, also
> > for C++ now), or =on (under -std=gnu[++]NN).  Keep -ffp-contract=fast
> > when -funsafe-math-optimizations (or -ffast-math, -Ofast) is active.
> >
> > In other words,
> >
> > - -ffast-math: no change, unrestricted contraction like before;
> > - standards compliant mode for C: no change, no contraction;
> > - ditto, for C++: align with C (no contraction);
> > - otherwise, switch C and C++ from -ffp-contract=fast to =on.
> >
> > gcc/c-family/ChangeLog:
> >
> >         * c-opts.cc (c_common_post_options): Adjust handling of
> >         flag_fp_contract_mode.
> >
> > gcc/ChangeLog:
> >
> >         * doc/invoke.texi (-ffp-contract): Describe new defaults.
> >         (-funsafe-math-optimizations): Add -ffp-contract=fast.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/pr116979.c: Add -ffp-contract=fast to flags.
> >         * gcc.target/i386/vect-pr82426.c: Ditto.
> >         * gcc.target/i386/intrinsics_opt-1.c: Ditto.
> > ---
> >  gcc/c-family/c-opts.cc                           | 11 ++---------
> >  gcc/doc/invoke.texi                              | 10 ++++++----
> >  gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr116979.c         |  2 +-
> >  gcc/testsuite/gcc.target/i386/vect-pr82426.c     |  2 +-
> >  5 files changed, 11 insertions(+), 16 deletions(-)
> >
> > diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> > index 3ee9444cbe..40aba4dcc1 100644
> > --- a/gcc/c-family/c-opts.cc
> > +++ b/gcc/c-family/c-opts.cc
> > @@ -877,15 +877,8 @@ c_common_post_options (const char **pfilename)
> >      flag_excess_precision = (flag_iso ? EXCESS_PRECISION_STANDARD
> >                              : EXCESS_PRECISION_FAST);
> >
> > -  /* ISO C restricts floating-point expression contraction to within
> > -     source-language expressions (-ffp-contract=on, currently an alias
> > -     for -ffp-contract=off).  */
> > -  if (flag_iso
> > -      && !c_dialect_cxx ()
> > -      && (OPTION_SET_P (flag_fp_contract_mode)
> > -         == (enum fp_contract_mode) 0)
> > -      && flag_unsafe_math_optimizations == 0)
> > -    flag_fp_contract_mode = FP_CONTRACT_OFF;
> > +  if (!flag_unsafe_math_optimizations && !OPTION_SET_P 
> > (flag_fp_contract_mode))
> > +    flag_fp_contract_mode = flag_iso ? FP_CONTRACT_OFF : FP_CONTRACT_ON;
> >
> >    /* C language modes before C99 enable -fpermissive by default, but
> >       only if -pedantic-errors is not specified.  Also treat
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index b83818337c..b03d2f16be 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -13111,16 +13111,17 @@ Disabled by default.
> >  @opindex ffp-contract
> >  @item -ffp-contract=@var{style}
> >  @option{-ffp-contract=off} disables floating-point expression contraction.
> > +This is the default for C and C++ in a standards compliant mode
> > +(@option{-std=c11}, @option{-std=c++11} or similar).
> >  @option{-ffp-contract=fast} enables floating-point expression contraction
> >  such as forming of fused multiply-add operations if the target has
> >  native support for them.
> > +This is the default for languages other than C and C++.
> >  @option{-ffp-contract=on} enables floating-point expression contraction
> >  if allowed by the language standard.  This is implemented for C and C++,
> >  where it enables contraction within one expression, but not across
> >  different statements.
> > -
> > -The default is @option{-ffp-contract=off} for C in a standards compliant 
> > mode
> > -(@option{-std=c11} or similar), @option{-ffp-contract=fast} otherwise.
> > +This is the default when compiling GNU dialects of C or C++.
> >
> >  @opindex fomit-frame-pointer
> >  @item -fomit-frame-pointer
> > @@ -15614,7 +15615,8 @@ or ISO rules/specifications for math functions. It 
> > may, however,
> >  yield faster code for programs that do not require the guarantees
> >  of these specifications.
> >  Enables @option{-fno-signed-zeros}, @option{-fno-trapping-math},
> > -@option{-fassociative-math} and @option{-freciprocal-math}.
> > +@option{-fassociative-math}, @option{-freciprocal-math} and
> > +@option{-ffp-contract=fast}.
> >
> >  The default is @option{-fno-unsafe-math-optimizations}.
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c 
> > b/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c
> > index a75bf4e9ca..edd8a7d964 100644
> > --- a/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/intrinsics_opt-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mfma" } */
> > +/* { dg-options "-O2 -mfma -ffp-contract=fast" } */
> >
> >  #include <emmintrin.h>
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr116979.c 
> > b/gcc/testsuite/gcc.target/i386/pr116979.c
> > index 0d2a958af4..fa2022b8a9 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr116979.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr116979.c
> > @@ -1,6 +1,6 @@
> >  /* PR target/116979 */
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mfma -fvect-cost-model=unlimited" } */
> > +/* { dg-options "-O2 -mfma -fvect-cost-model=unlimited -ffp-contract=fast" 
> > } */
> >  /* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)pd" } } */
> >  /* { dg-final { scan-assembler "vfmaddsub(?:132|213|231)ps" { target { ! 
> > ia32 } } } } */
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/vect-pr82426.c 
> > b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> > index 03b10adff9..8ce8fe78a9 100644
> > --- a/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> > +++ b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> > @@ -1,6 +1,6 @@
> >  /* i?86 does not have V2SF, x32 does though.  */
> >  /* { dg-do compile { target { ! ia32 } } } */
> > -/* { dg-options "-O3 -mavx -mfma" } */
> > +/* { dg-options "-O3 -mavx -mfma -ffp-contract=fast" } */
> >
> >  struct Matrix
> >  {
> > --
> > 2.49.0
> >

Re: [PATCH 1/2] allow contraction to synthetic single-element vector FMA

Reply via email to