On Tue, May 10, 2022 at 07:27:30AM -0500, Segher Boessenkool wrote:
> > IMHO, it's something we want to fix as well, based on the reasons:
> >   1) bif names have the corresponding mnemonics, users would expect 1-1 
> > mapping here.
> >   2) clang emits xs{min,max}dp all the time, with cpu type power7/8/9/10.
> >   3) according to uarch info, xs{min,max}cdp use the same units and have 
> > the same latency,
> >      no benefits to replace with xs{min,max}cdp.
> 
> I never understood any of this.  Mike?  Why do we do those "c" things
> at all, ever?

In the power7, we only had x{s,v}{min,max}{sp,dp}.  But those aren't useful for
optimizing normal (a > b) ? a : b without using -ffast-math.  Power9 added the
'c' and 'j' versions of the insns.  GCC never generates the 'j' version.

Basically for ?: we generate:

    *   Code = power8, no -ffast-math:    Generate compare, move;
    *   Code = power8, -ffast-math:       Generate xsmaxdp/xsmindp;
    *   Code = power9, no -ffast-mth:     Generate xsmaxcdp/xsmincdp; (and)
    *   Code = power9, -ffast-math:       Generate xsmaxcdp/xsmincdp.

For the __builtin_fmax and __builtin_fmin functions:

    *   Code = power8, no -ffast-math:    Generate call to fmax/fmin;
    *   Code = power8, -ffast-math:       Generate xsmaxdp/xsmindp;
    *   Code = power9, no -ffast-mth:     Generate call to fmax/fmin; (and)
    *   Code = power9, -ffast-math:       Generate xsmaxcdp/xsmincdp.

For IEEE 128-bit floating point, we only have xs{min,max}cqp.  We do not have
the version without 'c' nor do we have the 'j' version.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Reply via email to