[Bug libgcc/66382] POWER8 Vector optimized implementation of __float128 (IEEE754 128-bit Binary Floating Point)

meissner at gcc dot gnu.org Mon, 28 Mar 2016 15:23:30 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66382


--- Comment #7 from Michael Meissner <meissner at gcc dot gnu.org> ---
I do not think it is worthwhile to expand the IEEE 128-bit software emulation
routines at the point of call in ISA 2.07 (power8).  This is due to the fact
that a lot of processing goes on in the emulation library.

Now if people were motivated, we could replace the soft-fp functions with tuned
versions written for the Power8 ISA.

However, I don't think it is going to be an easy task.  I'll list the things
that come to mind first.  They might be able to be done, but it is a
time/effort calculation of whether the return on investment is worth it.

The first issue is the next ISA (3.0) has support in it already for doing IEEE
128-bit floating point in hardware, including supporting the various rounding
modes, etc. If you configure and build gcc with an assembler that supports the
ISA 3.0 instruction set, it will add in IFUNC support so that when a program is
run on ISA 3.0 hardware, it will automatically use a version of __addkf3 that
uses the xsaddqp instruction instead of doing the emulation. So this effort
would only be for the current generation of hardware.

The next issue is right now you cannot do 64-bit scalar int arithmetic in the
VSX unit. At present, the compiler does not allow DImode into the Altivec
registers, but the only support for 64-bit integer arithmetic uses an Altivec
encoding and only uses Altivec registers (vaddudm, etc.).

I am working on patches to allow DImode variables in Altivec registers (and
later SImode, HImode, QImode). My first run with the patch shows 1 benchmark 2%
faster (perlbench) and one 2% slower (omnetpp), but I feel it needs a lot more
tuning.  At the moment, that work is lower priority, as I am trying to make
__float128 _Complex work as my highest priority (obviously other people could
take up this work).

After allowing DImode into Altivec registers, and perhaps doing 64-bit
arithmetic via the 64-bit integer vector instructions, another issue is that
the cycle time of the vector unit is 1/2 that of the GPR unit, so it will need
a lot of tuning.

I don't think the ISA 2.07 instruction set is general enough to do inserts and
extracts of 128-bit values that you would need for packing and unpacking the
IEEE 128-bit floating point values. ISA 3.0 has all of this support, including
specialized instructions to extract/set the exponent or mantissa, but then it
also has the hardware support for IEEE 128-bit floating point.

Load/stores are also problematical in ISA 2.07, given there is no d-form
addressing for Altivec registers.  So if you spill a DImode value in an Altivec
register, you need to load up the offset in a GPR to do the memory operation.

[Bug libgcc/66382] POWER8 Vector optimized implementation of __float128 (IEEE754 128-bit Binary Floating Point)

Reply via email to