"Kewen.Lin" <li...@linux.ibm.com> writes: > Hi Richard, > > Thanks for the comments! > > on 2024/8/12 16:55, Richard Sandiford wrote: >> "Kewen.Lin" <li...@linux.ibm.com> writes: >>> Hi, >>> >>> Commit r15-2084 exposes one ICE in LRA. Firstly, before >>> r15-2084 KFmode has 126 bit precision while V1TImode has 128 >>> bit precision, so the subreg (subreg:V1TI (reg:KF 131) 0) is >>> paradoxical_subreg_p, which stops some passes from doing >>> some optimization. After r15-2084, KFmode has the same mode >>> precision as V1TImode, passes are able to optimize more, but >>> it causes this ICE in LRA as described below: >>> >>> For insn 106 (set (mem:V1TI ...) (subreg:V1TI (reg:KF 133) 0)), >>> which matches pattern >>> >>> (define_insn "*vsx_le_perm_store_<mode>" >>> [(set (match_operand:VSX_LE_128 0 "memory_operand" "=Z,Q") >>> (match_operand:VSX_LE_128 1 "vsx_register_operand" "+wa,r"))] >> >> Is it well-formed to have "+" on an operand that is only semantically >> an input? df and most passes would consider it to be a pure input >> and so would disregard any write that is intended to take place. >> >> E.g. if we have: >> >> (set (reg:M R2) (reg:M R1)) ;; R1 dead >> (set (reg:M R3) (reg:M R2)) >> >> then it would be valid to change that to: >> >> (set (reg:M R2) (reg:M R1)) >> (set (reg:M R3) (reg:M R1)) >> >> without considering the "+" on the input to the first instruction. > > Good question, I guess the "+" is to match the fact that operand[1] > can be both read and written by this insn, operand[1] has to be > re-used as the dest register of vector rotate 64bit (doubleword swap). > but it'll get swapped back so it's safe if the register is still live > (like the above example). For this case that writing to but later > restoring, I'm not sure if we can take it as "no write" (strip "+").
Yeah. AIUI, there is no way of satisfying the constraints in a way that makes operand 0 overlap operand 1, and: > > ;; The post-reload split requires that we re-permute the source > ;; register in case it is still live. > (define_split > [(set (match_operand:VSX_LE_128 0 "memory_operand") > (match_operand:VSX_LE_128 1 "vsx_register_operand"))] > "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed && !TARGET_P9_VECTOR > && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" > [(const_int 0)] > { > rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); > rs6000_emit_le_vsx_permute (operands[0], operands[1], <MODE>mode); > rs6000_emit_le_vsx_permute (operands[1], operands[1], <MODE>mode); would not work correctly for that kind of overlap in any case. So it looks on the face of it like the pattern would be (more) correct without the "+". Thanks, Richard > DONE; > }) > > CC the author Mike as well. > >> >> But if we do stick with the patch... >> >>> "!BYTES_BIG_ENDIAN && TARGET_VSX && !TARGET_P9_VECTOR >>> && !altivec_indexed_or_indirect_operand (operands[0], <MODE>mode)" >>> "@ >>> # >>> #" >>> [(set_attr "type" "vecstore,store") >>> (set_attr "length" "12,8") >>> (set_attr "isa" "<VSisa>,*")]) >>> >>> LRA makes equivalence substitution on r133 with const double >>> (const_double:KF 0.0), selects alternative 0 and fixes up >>> operand 1 for constraint "wa", because operand 1 is OP_INOUT, >>> so it considers assigning back to it as well, that is: >>> >>> lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); >>> >>> But because old has been changed to const_double in equivalence >>> substitution, the move is actually assigning to const_double, >>> which is invalid and cause ICE. >>> >>> Considering reg:KF 133 is equivalent with (const_double:KF 0.0) >>> even though this operand is OP_INOUT, IMHO there should not be >>> any following uses of reg:KF 133, otherwise it doesn't have the >>> chance to be equivalent to (const_double:KF 0.0). From this >>> perspective, I think we can guard the lra_emit_move with >>> nonimmediate_operand to exclude such case. >>> >>> Does it sound reasonable? >>> >>> btw, I also tried with disallowing equivalence substitution with >>> CONSTANT_P value if the corresponding operand is OP_INOUT or >>> OP_OUT, it can also fix this issue, but with more thinking it >>> seems not necessary to stop such substitution if we can handle it >>> later as above. >>> >>> Bootstrapped and regtested on x86_64-redhat-linux and >>> powerpc64{,le}-linux-gnu. >>> >>> BR, >>> Kewen >>> ----- >>> PR rtl-optimization/116170 >>> >>> gcc/ChangeLog: >>> >>> * lra-constraints.cc (curr_insn_transform): Don't emit move back to >>> old operand if it's nonimmediate_operand. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/powerpc/pr116170.c: New test. >>> --- >>> gcc/lra-constraints.cc | 3 ++- >>> gcc/testsuite/gcc.target/powerpc/pr116170.c | 18 ++++++++++++++++++ >>> 2 files changed, 20 insertions(+), 1 deletion(-) >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr116170.c >>> >>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc >>> index 92b343fa99a..024c85c87d9 100644 >>> --- a/gcc/lra-constraints.cc >>> +++ b/gcc/lra-constraints.cc >>> @@ -4742,7 +4742,8 @@ curr_insn_transform (bool check_only_p) >>> } >>> *loc = new_reg; >>> if (type != OP_IN >>> - && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) >>> + && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX >>> + && nonimmediate_operand (old, GET_MODE (old))) >> >> ...IMO it would be better to use CONSTANT_P instead. It's sometimes >> useful to accept constants in specific instructions only, meaning that >> they aren't general enough to be immediate_operands. > > OK, will do. I used "CONSTANT_P" in the 1st draft, but considered it's > guarding a move, by checking many existing mov optab predicates (they > only accept general operands) so felt nonimmediate_operand seems to be a > better fit here. > > BR, > Kewen