Hi All, The fallback expansion of __builtin_bswap16 on pre-Power10 used a sequence of multiple rlwinm/or instructions:
mr r9,r3 rlwinm r3,r9,24,24,31 rlwinm r10,r9,8,16,23 or r3,r3,r10 rlwinm r3,r3,0,0xffff This was functionally correct but less optimal. Rewrite the splitter to use a rotate-insert idiom, producing: mr r9,r3 slwi r3,r9,8 rlwimi r3,r9,24,24,31 rlwinm r3,r3,0,0xffff This sequence is shorter, maps directly to the rlwimi instruction. The following patch has been bootstrapped on powerpc64le-linux. 2025-09-09 Kishan Parmar <kis...@linux.ibm.com> gcc/ PR target/121076 * config/rs6000/rs6000.md (bswaphi2_reg): Replace multi-instruction rotate-mask/rotate-mask/or sequence with shift/rotate-mask-insert idiom, reducing insn count for bswap16 on pre-Power10 targets. --- gcc/config/rs6000/rs6000.md | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 04a6c0f7461..7c48cb900b6 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -2676,21 +2676,18 @@ xxbrh %x0,%x1" "reload_completed && !TARGET_POWER10 && int_reg_operand (operands[0], HImode)" [(set (match_dup 3) - (and:SI (lshiftrt:SI (match_dup 4) - (const_int 8)) - (const_int 255))) - (set (match_dup 2) - (and:SI (ashift:SI (match_dup 4) - (const_int 8)) - (const_int 65280))) ;; 0xff00 + (ashift:SI (match_dup 4) + (const_int 8))) (set (match_dup 3) - (ior:SI (match_dup 3) - (match_dup 2)))] + (ior:SI (and:SI (match_dup 3) + (const_int -256)) + (and:SI (lshiftrt:SI (match_dup 4) (const_int 8)) + (const_int 255))))] ;; 0x00ff { operands[3] = simplify_gen_subreg (SImode, operands[0], HImode, 0); operands[4] = simplify_gen_subreg (SImode, operands[1], HImode, 0); } - [(set_attr "length" "*,12,*") + [(set_attr "length" "*,8,*") (set_attr "type" "shift,*,vecperm") (set_attr "isa" "p10,*,p9v")]) -- 2.47.3