> On 7 Nov 2025, at 10:23, Andre Vieira <[email protected]> wrote:
>
> Expands the use of eor3 where we'd otherwise use two vector eor's.
>
> Bootstrapped and regression tested on aarch64-none-linux-gnu.
>
> OK for trunk?
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd.md (*eor3q<mode>4): New insn to be used by
> combine after reload to optimize any grouping of eor's that are using
> FP registers for
> scalar modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/eor3-opt.c: New test.
>
> <eor3.patch>
diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index
0d5b02a739fa74724d6dc8b658638d55b8db6890..3bf668e25b58a463f1d35387b1c6af7cc04e3a16
100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -9201,15 +9201,28 @@
;; sha3
-(define_insn "eor3q<mode>4"
- [(set (match_operand:VDQ_I 0 "register_operand" "=w")
- (xor:VDQ_I
- (xor:VDQ_I
- (match_operand:VDQ_I 2 "register_operand" "w")
- (match_operand:VDQ_I 3 "register_operand" "w"))
- (match_operand:VDQ_I 1 "register_operand" "w")))]
+(define_insn_and_split "eor3q<mode>4"
+ [(set (match_operand:VSDQ_I 0 "register_operand")
+ (xor:VSDQ_I
+ (xor:VSDQ_I
+ (match_operand:VSDQ_I 2 "register_operand")
+ (match_operand:VSDQ_I 3 "register_operand"))
+ (match_operand:VSDQ_I 1 "register_operand")))]
"TARGET_SHA3"
- "eor3\\t%0.16b, %1.16b, %2.16b, %3.16b"
+ {@ [ cons: =0 , %1 , 2 , 3 ]
+ [ w , w , w , w ] eor3\t%0.16b, %1.16b, %2.16b, %3.16b
+ [ r , r , r , r ] #
+ }
+ "&& reload_completed && GP_REGNUM_P (REGNO (operands[0]))”
The “=r,r,r,r” alternative should only be allowed for 64-bit modes?
Maybe it’s cleaner to split this pattern to allow the define_and_split just for
VD_I modes?
+ [(const_int 0)]
+ {
+ machine_mode xor_mode = <MODE>mode == DImode ? DImode : SImode;
Consequence of the above, this path can be executed for non-32 or 64-bit modes
as well.
+ emit_move_insn (operands[0],
+ gen_rtx_XOR (xor_mode, operands[1], operands[2]));
That doesn’t seem like it would generate valid RTL. I think the XOR needs to
have the same mode as its operands.
So operands[1], operands[2] need to be wrapped in a subregion of an appropriate
mode to keep things consistent
Thanks,
Kyrill
+ emit_move_insn (operands[0],
+ gen_rtx_XOR (xor_mode, operands[0], operands[3]));
+ DONE;
+ }
[(set_attr "type" "crypto_sha3")]
)