On Tue, May 10, 2022 at 07:27:30AM -0500, Segher Boessenkool wrote: > > IMHO, it's something we want to fix as well, based on the reasons: > > 1) bif names have the corresponding mnemonics, users would expect 1-1 > > mapping here. > > 2) clang emits xs{min,max}dp all the time, with cpu type power7/8/9/10. > > 3) according to uarch info, xs{min,max}cdp use the same units and have > > the same latency, > > no benefits to replace with xs{min,max}cdp. > > I never understood any of this. Mike? Why do we do those "c" things > at all, ever?
In the power7, we only had x{s,v}{min,max}{sp,dp}. But those aren't useful for optimizing normal (a > b) ? a : b without using -ffast-math. Power9 added the 'c' and 'j' versions of the insns. GCC never generates the 'j' version. Basically for ?: we generate: * Code = power8, no -ffast-math: Generate compare, move; * Code = power8, -ffast-math: Generate xsmaxdp/xsmindp; * Code = power9, no -ffast-mth: Generate xsmaxcdp/xsmincdp; (and) * Code = power9, -ffast-math: Generate xsmaxcdp/xsmincdp. For the __builtin_fmax and __builtin_fmin functions: * Code = power8, no -ffast-math: Generate call to fmax/fmin; * Code = power8, -ffast-math: Generate xsmaxdp/xsmindp; * Code = power9, no -ffast-mth: Generate call to fmax/fmin; (and) * Code = power9, -ffast-math: Generate xsmaxcdp/xsmincdp. For IEEE 128-bit floating point, we only have xs{min,max}cqp. We do not have the version without 'c' nor do we have the 'j' version. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com