https://gcc.gnu.org/g:67d85eb443d8b82f5c4389004343580fa8c7b58e
commit 67d85eb443d8b82f5c4389004343580fa8c7b58e Author: Michael Meissner <[email protected]> Date: Thu Nov 13 11:19:54 2025 -0500 Update ChangeLog.* Diff: --- gcc/ChangeLog.float | 45 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 42 insertions(+), 3 deletions(-) diff --git a/gcc/ChangeLog.float b/gcc/ChangeLog.float index e54022648ca2..d1725dfdc1d2 100644 --- a/gcc/ChangeLog.float +++ b/gcc/ChangeLog.float @@ -2,6 +2,48 @@ Optimize __bfloat16 scalar code. +Optimize __bfloat16 binary operations. Unlike _Float16 where we +have instructions to convert between HFmode and SFmode as scalar +values, with BFmode, we only have vector conversions. Thus to do: + + __bfloat16 a, b, c; + + a = b + c; + +the GCC compiler generates the following code: + + lxsihzx 0,4,2 // load __bfloat16 value b + lxsihzx 12,5,2 // load __bfloat16 value c + xxsldwi 0,0,0,1 // shift b into bits 16..31 + xxsldwi 12,12,12,1 // shift c into bits 16..31 + xvcvbf16spn 0,0 // vector convert b into V4SFmode + xvcvbf16spn 12,12 // vector convert c into V4SFmode + xscvspdpn 0,0 // convert b into SFmode scalar + xscvspdpn 12,12 // convert c into SFmode scalar + fadds 0,0,12 // add b+c + xscvdpspn 0,0 // convert b+c into SFmode memory format + xvcvspbf16 0,0 // convert b+c into BFmode memory format + stxsihx 0,3,2 // store b+c + +Using the following combiner patterns that are defined in this patch, the code +generated would be: + + + lxsihzx 12,4,2 // load __bfloat16 value b + lxsihzx 0,5,2 // load __bfloat16 value c + xxspltw 12,12,1 // shift b into bits 16..31 + xxspltw 0,0,1 // shift c into bits 16..31 + xvcvbf16spn 12,12 // vector convert b into V4SFmode + xvcvbf16spn 0,0 // vector convert c into V4SFmode + xvaddsp 0,0,12 // vector b+c in V4SFmode + xvcvspbf16 0,0 // convert b+c into BFmode memory format + stxsihx 0,3,2 // store b+c + +We cannot just define insns like 'addbf3' to keep the operation as +BFmode because GCC will not generate these patterns unless the user +uses -Ofast. Without -Ofast, it will always convert BFmode into +SFmode. + 2025-11-13 Michael Meissner <[email protected]> gcc/ @@ -18,15 +60,12 @@ gcc/ (bfloat16_nfma_internal1): Likewise. (bfloat16_nfma_internal2): Likewise. (bfloat16_nfms_internal3): Likewise. - (__bfloat16 peephole): Likewise. * config/rs6000/predicates.md (fp16_reg_or_constant_operand): New predicate. (bfloat16_v4sf_operand): Likewise. (bfloat16_bf_operand): Likewise. * config/rs6000/rs6000-protos.h (bfloat16_operation_as_v4sf): New declaration. - * config/rs6000/rs6000.opt (-mbfloat16-combine): New option. - (-mbfloat16-peephole): Likewise. ==================== Branch work226-float, patch #208 ====================
