https://gcc.gnu.org/g:50f8fac5b6e9972bf70764d2e5d0871d38dc518a

commit 50f8fac5b6e9972bf70764d2e5d0871d38dc518a
Author: Michael Meissner <[email protected]>
Date:   Thu Nov 13 11:22:37 2025 -0500

    Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.ibm | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/gcc/ChangeLog.ibm b/gcc/ChangeLog.ibm
index f0462ec745f4..f74c6a15bd1c 100644
--- a/gcc/ChangeLog.ibm
+++ b/gcc/ChangeLog.ibm
@@ -1,3 +1,72 @@
+==================== Branch ibm/gcc-16-future-float16, patch #209 
====================
+
+Optimize __bfloat16 scalar code.
+
+Optimize __bfloat16 binary operations.  Unlike _Float16 where we
+have instructions to convert between HFmode and SFmode as scalar
+values, with BFmode, we only have vector conversions.  Thus to do:
+
+       __bfloat16 a, b, c;
+
+       a = b + c;
+
+the GCC compiler generates the following code:
+
+       lxsihzx 0,4,2           // load __bfloat16 value b
+       lxsihzx 12,5,2          // load __bfloat16 value c
+       xxsldwi 0,0,0,1         // shift b into bits 16..31
+       xxsldwi 12,12,12,1      // shift c into bits 16..31
+       xvcvbf16spn 0,0         // vector convert b into V4SFmode
+       xvcvbf16spn 12,12       // vector convert c into V4SFmode
+       xscvspdpn 0,0           // convert b into SFmode scalar
+       xscvspdpn 12,12         // convert c into SFmode scalar
+       fadds 0,0,12            // add b+c
+       xscvdpspn 0,0           // convert b+c into SFmode memory format
+       xvcvspbf16 0,0          // convert b+c into BFmode memory format
+       stxsihx 0,3,2           // store b+c
+
+Using the following combiner patterns that are defined in this patch, the code
+generated would be:
+
+
+       lxsihzx 12,4,2          // load __bfloat16 value b
+       lxsihzx 0,5,2           // load __bfloat16 value c
+       xxspltw 12,12,1         // shift b into bits 16..31
+       xxspltw 0,0,1           // shift c into bits 16..31
+       xvcvbf16spn 12,12       // vector convert b into V4SFmode
+       xvcvbf16spn 0,0         // vector convert c into V4SFmode
+       xvaddsp 0,0,12          // vector b+c in V4SFmode
+       xvcvspbf16 0,0          // convert b+c into BFmode memory format
+       stxsihx 0,3,2           // store b+c
+
+We cannot just define insns like 'addbf3' to keep the operation as
+BFmode because GCC will not generate these patterns unless the user
+uses -Ofast.  Without -Ofast, it will always convert BFmode into
+SFmode.
+
+2025-11-13  Michael Meissner  <[email protected]>
+
+gcc/
+
+       * config/rs6000/float16.cc (bfloat16_operation_as_v4sf): New function to
+       optimize __bfloat16 scalar operations.
+       * config/rs6000/float16.md (bfloat16_binary_op_internal1): New
+       __bfloat16 scalar combiner insns.
+       (bfloat16_binary_op_internal2): Likewise.
+       (bfloat16_fma_internal1): Likewise.
+       (bfloat16_fma_internal2): Likewise.
+       (bfloat16_fms_internal1): Likewise.
+       (bfloat16_fms_internal2): Likewise.
+       (bfloat16_nfma_internal1): Likewise.
+       (bfloat16_nfma_internal2): Likewise.
+       (bfloat16_nfms_internal3): Likewise.
+       * config/rs6000/predicates.md (fp16_reg_or_constant_operand): New
+       predicate.
+       (bfloat16_v4sf_operand): Likewise.
+       (bfloat16_bf_operand): Likewise.
+       * config/rs6000/rs6000-protos.h (bfloat16_operation_as_v4sf): New
+       declaration.
+
 ==================== Branch ibm/gcc-16-future-float16, patch #208 
====================
 
 Add --with-powerpc-float16 and --with-powerpc-float16-disable-warning.

Reply via email to