This patch contains two SUBREG-related optimization enabling tweaks to the x86 backend.
The first change, to ix86_expand_vector_extract, cures the strange -march=cascadelake related non-determinism that affected my new test cases last week. Extracting a QImode or HImode element from an SSE vector performs a zero-extension to SImode, which is currently represented as: (set (subreg:SI (reg:QI target)) (zero_extend:SI (...)) Unfortunately, the semantics of this RTL doesn't quite match what was intended. A set of a paradoxical subreg allows the high-bits to take an arbitrary value (hence the non-determinism). A more correct representation should be: (set (reg:SI temp) (zero_extend:SI (...)) (set (reg:QI target) (subreg:QI (reg:SI temp)) Optionally with the SUBREG rtx annotated as SUBREG_PROMOTED_VAR_P to indicate that value is already zero-extended in the SUBREG_REG. The second change is very similar, which is why I've included it in this patch, where currently the early RTL optimizers can produce: (set (reg:V?? hardreg) (subreg ...)) where this instruction may require a spill/reload from memory when the modes aren't tieable. Alas the presence of the hard register prevents combine/gcse etc. optimizing this away, or reusing the result which would increase the lifetime of the hard register before reload. The solution is to treat vector hard registers the same way as the x86 backend handles scalar hard registers, and only allow sets from pseudos before register allocation, which is achieved by checking ix86_hardreg_mov_ok. Hence the above instruction is expanded and maintained as: (set (reg:V?? pseudo) (subreg ...)) (set (reg:V?? hardreg) (reg:V?? pseudo)) which allows the RTL optimizers freedom to optimize the SUBREG. This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" and "make -k check" with no new failures. In theory, my recent "obvious" regexp fix to accommodate -march=cascadelake is no longer required, but there's no harm leaving the testsuite as it is. Ok for mainline? 2021-10-11 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.c (ix86_expand_vector_move): Use a pseudo intermediate when moving a SUBREG into a hard register, by checking ix86_hardreg_mov_ok. (ix86_expand_vector_extract): Store zero-extended SImode intermediate in a pseudo, then set target using a SUBREG_PROMOTED annotated subreg. * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating complex (SUBREG) sets of (vector) hard registers before reload, by checking ix86_hardreg_mov_ok. Thanks in advance, Roger --
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 4780b99..44404bd 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -617,8 +617,9 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[]) /* Make operand1 a register if it isn't already. */ if (can_create_pseudo_p () - && !register_operand (op0, mode) - && !register_operand (op1, mode)) + && (!ix86_hardreg_mov_ok (op0, op1) + || (!register_operand (op0, mode) + && !register_operand (op1, mode)))) { rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0)); emit_move_insn (tmp, op1); @@ -16005,11 +16006,15 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt) /* Let the rtl optimizers know about the zero extension performed. */ if (inner_mode == QImode || inner_mode == HImode) { + rtx reg = gen_reg_rtx (SImode); tmp = gen_rtx_ZERO_EXTEND (SImode, tmp); - target = gen_lowpart (SImode, target); + emit_move_insn (reg, tmp); + tmp = gen_lowpart (inner_mode, reg); + SUBREG_PROMOTED_VAR_P (tmp) = 1; + SUBREG_PROMOTED_SET (tmp, 1); } - emit_insn (gen_rtx_SET (target, tmp)); + emit_move_insn (target, tmp); } else { diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4559b0c..e43f597 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1270,7 +1270,8 @@ " C,<sseconstm1>,vm,v"))] "TARGET_SSE && (register_operand (operands[0], <MODE>mode) - || register_operand (operands[1], <MODE>mode))" + || register_operand (operands[1], <MODE>mode)) + && ix86_hardreg_mov_ok (operands[0], operands[1])" { switch (get_attr_type (insn)) {