Hi All,

The fallback expansion of __builtin_bswap16 on pre-Power10 used a
sequence of multiple rlwinm/or instructions:

  mr      r9,r3
  rlwinm  r3,r9,24,24,31
  rlwinm  r10,r9,8,16,23
  or      r3,r3,r10
  rlwinm  r3,r3,0,0xffff

This was functionally correct but less optimal.

Rewrite the splitter to use a rotate-insert idiom, producing:

  mr      r9,r3
  slwi    r3,r9,8
  rlwimi  r3,r9,24,24,31
  rlwinm  r3,r3,0,0xffff

This sequence is shorter, maps directly to the rlwimi instruction.

The following patch has been bootstrapped on powerpc64le-linux.

2025-09-09  Kishan Parmar  <kis...@linux.ibm.com>

gcc/
        PR target/121076
        * config/rs6000/rs6000.md (bswaphi2_reg): Replace multi-instruction
        rotate-mask/rotate-mask/or  sequence with  shift/rotate-mask-insert
        idiom, reducing insn count for bswap16 on pre-Power10 targets.
---
 gcc/config/rs6000/rs6000.md | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 04a6c0f7461..7c48cb900b6 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2676,21 +2676,18 @@
    xxbrh %x0,%x1"
   "reload_completed && !TARGET_POWER10 && int_reg_operand (operands[0], 
HImode)"
   [(set (match_dup 3)
-       (and:SI (lshiftrt:SI (match_dup 4)
-                            (const_int 8))
-               (const_int 255)))
-   (set (match_dup 2)
-       (and:SI (ashift:SI (match_dup 4)
-                          (const_int 8))
-               (const_int 65280)))             ;; 0xff00
+       (ashift:SI (match_dup 4)
+                  (const_int 8)))
    (set (match_dup 3)
-       (ior:SI (match_dup 3)
-               (match_dup 2)))]
+       (ior:SI (and:SI (match_dup 3)
+                       (const_int -256))
+               (and:SI (lshiftrt:SI (match_dup 4) (const_int 8))
+                       (const_int 255))))]         ;; 0x00ff
 {
   operands[3] = simplify_gen_subreg (SImode, operands[0], HImode, 0);
   operands[4] = simplify_gen_subreg (SImode, operands[1], HImode, 0);
 }
-  [(set_attr "length" "*,12,*")
+  [(set_attr "length" "*,8,*")
    (set_attr "type" "shift,*,vecperm")
    (set_attr "isa" "p10,*,p9v")])
 
-- 
2.47.3

Reply via email to