On Wed, Sep 11, 2024 at 08:57:23PM +0200, Ilya Leoshkevich wrote: > On Wed, 2024-09-11 at 16:44 +0200, Stefan Schulze Frielinghaus wrote: > > On Wed, Sep 11, 2024 at 01:59:48PM +0200, Ilya Leoshkevich wrote: > > > On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus > > > wrote: > > > > On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote: > > > > > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus > > > > > wrote: > > > > > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich > > > > > > wrote: > > > > > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze > > > > > > > Frielinghaus > > > > > > > wrote: > > > > > > > > Currently subregs originating from *tf_to_fprx2_0 and > > > > > > > > *tf_to_fprx2_1 > > > > > > > > survive register allocation. This in turn leads to wrong > > > > > > > > register > > > > > > > > renaming. Keeping the current approach would mean we > > > > > > > > need > > > > > > > > two > > > > > > > > insns > > > > > > > > for > > > > > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. > > > > > > > > Something > > > > > > > > along > > > > > > > > the > > > > > > > > lines > > > > > > > > > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > > > > > [(set (subreg:DF (match_operand:FPRX2 0 > > > > > > > > "nonimmediate_operand" > > > > > > > > "=f") 0) > > > > > > > > (unspec:DF [(match_operand:TF 1 "general_operand" > > > > > > > > "v")] > > > > > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > > > > > "TARGET_VXE" > > > > > > > > "#") > > > > > > > > > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > > > > > [(set (match_operand:DF 0 "nonimmediate_operand" "=f") > > > > > > > > (unspec:DF [(match_operand:TF 1 "general_operand" > > > > > > > > "v")] > > > > > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > > > > > "TARGET_VXE" > > > > > > > > "vpdi\t%v0,%v1,%v0,1 > > > > > > > > [(set_attr "op_type" "VRR")]) > > > > > > > > > > > > > > > > and similar for *tf_to_fprx2_1. Note, pre register > > > > > > > > allocation > > > > > > > > operand 0 > > > > > > > > has mode FPRX2 and afterwards DF once subregs have been > > > > > > > > eliminated. > > > > > > > > > > > > > > > > Since we always copy a whole vector register into a > > > > > > > > floating- > > > > > > > > point > > > > > > > > register pair, another way to fix this is to merge > > > > > > > > *tf_to_fprx2_0 > > > > > > > > and > > > > > > > > *tf_to_fprx2_1 into a single insn which means we don't > > > > > > > > have > > > > > > > > to > > > > > > > > use > > > > > > > > subregs at all. The downside of this is that the > > > > > > > > assembler > > > > > > > > template > > > > > > > > contains two instructions, now. The upside is that we > > > > > > > > don't > > > > > > > > have > > > > > > > > to > > > > > > > > come up with some artificial insn before RA which might > > > > > > > > be > > > > > > > > more > > > > > > > > readable/maintainable. That is implemented by this > > > > > > > > patch. > > > > > > > > > > > > > > > > In commit r11-4872-ge627cda5686592, the output operand > > > > > > > > specifier > > > > > > > > %V > > > > > > > > was > > > > > > > > introduced which is used in tf_to_fprx2 only, now. I > > > > > > > > didn't > > > > > > > > come > > > > > > > > up > > > > > > > > with its counterpart like %F for floating-point > > > > > > > > registers. > > > > > > > > Instead I > > > > > > > > printed the register pair in the output function > > > > > > > > directly. > > > > > > > > This > > > > > > > > spares > > > > > > > > us a new and "rare" format specifier for a single insn. > > > > > > > > I > > > > > > > > don't > > > > > > > > have > > > > > > > > a > > > > > > > > strong opinion which option to choose, however, we should > > > > > > > > either > > > > > > > > add > > > > > > > > %F > > > > > > > > in order to mimic the same behaviour as %V or getting rid > > > > > > > > of > > > > > > > > %V > > > > > > > > and > > > > > > > > inline the logic in the output function. I lean towards > > > > > > > > the > > > > > > > > latter. > > > > > > > > Any preferences? > > > > > > > > --- > > > > > > > > gcc/config/s390/s390.md | 2 + > > > > > > > > gcc/config/s390/vector.md | 66 > > > > > > > > +++++++++++- > > > > > > > > ---- > > > > > > > > ---- > > > > > > > > -- > > > > > > > > gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 > > > > > > > > +++++++++ > > > > > > > > 3 files changed, 60 insertions(+), 34 deletions(-) > > > > > > > > create mode 100644 > > > > > > > > gcc/testsuite/gcc.target/s390/pr115860- > > > > > > > > 1.c > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > + char buf[64]; > > > > > > > > + switch (which_alternative) > > > > > > > > + { > > > > > > > > + case 0: > > > > > > > > + if (REGNO (operands[0]) == REGNO (operands[1])) > > > > > > > > + return "vpdi\t%V0,%v1,%V0,5"; > > > > > > > > + else > > > > > > > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5"; > > > > > > > > + case 1: > > > > > > > > + { > > > > > > > > + const char *reg_pair = reg_names[REGNO > > > > > > > > (operands[0]) > > > > > > > > + > > > > > > > > 1]; > > > > > > > > + snprintf (buf, sizeof (buf), > > > > > > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1", > > > > > > > > reg_pair); > > > > > > > > > > > > > > I wonder if there is a corner case where 8+ does not fit > > > > > > > into > > > > > > > short > > > > > > > displacement? > > > > > > > > > > > > That is covered by constraint AR, i.e., for short > > > > > > displacement, > > > > > > and > > > > > > AT > > > > > > for long displacement. > > > > > > > > > > Don't they cover only %1, and not 8+%1? Can't there be a > > > > > situation > > > > > where %1 barely fits and 8+%1 doesn't fit? A quick glance shows > > > > > that > > > > > the code doesn't leave any allowance for this: > > > > > > > > > > "AR" > > > > > s390_mem_constraint("AR") > > > > > s390_check_qrst_address('R') > > > > > s390_short_displacement() > > > > > INTVAL (disp) >= 0 && INTVAL (disp) < 4096 > > > > > > > > Isn't this covered by > > > > > > > > int > > > > s390_mem_constraint (const char *str, rtx op) > > > > { > > > > char c = str[0]; > > > > > > > > switch (c) > > > > { > > > > case 'A': > > > > /* Check for offsettable variants of memory constraints. > > > > */ > > > > if (!MEM_P (op) || MEM_VOLATILE_P (op)) > > > > return 0; > > > > if ((reload_completed || reload_in_progress) > > > > ? !offsettable_memref_p (op) : > > > > !offsettable_nonstrict_memref_p (op)) > > > > return 0; > > > > > > > > where > > > > > > > > /* Return true if OP is a memory reference whose address contains > > > > no side effects and remains valid after the addition of a > > > > positive > > > > integer less than the size of the object being referenced. > > > > > > > > We assume that the original address is valid and do not check > > > > it. > > > > > > > > This uses strict_memory_address_p as a subroutine, so > > > > don't use it before reload. */ > > > > > > > > bool > > > > offsettable_memref_p (rtx op) > > > > { > > > > return ((MEM_P (op)) > > > > && offsettable_address_addr_space_p (1, GET_MODE (op), > > > > XEXP > > > > (op, 0), > > > > MEM_ADDR_SPACE > > > > (op))); > > > > } > > > > > > > > guarantees that whatever we add to the address such that we stay > > > > inside the > > > > object, the address is valid. Or do I miss something? > > > > > > How does it know that address+addend should be not simply stay > > > valid, > > > but also fit into short displacement? > > > > Right, so offsettable_memref_p only ensures that any resulting > > address is a > > valid general address. So we have to manually check for short > > displacement. > > Maybe something along the lines: > > > > diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc > > index 7aea776da2f..e61cda8352a 100644 > > --- a/gcc/config/s390/s390.cc > > +++ b/gcc/config/s390/s390.cc > > @@ -3714,6 +3714,18 @@ s390_mem_constraint (const char *str, rtx op) > > if ((reload_completed || reload_in_progress) > > ? !offsettable_memref_p (op) : > > !offsettable_nonstrict_memref_p (op)) > > return 0; > > + /* offsettable_memref_p ensures only that any positive offset > > added to > > + the address forms a valid general address. For Q and R > > constraints we > > + also have to verify that the resulting displacement after > > adding any > > + positive offset less than the size of the object being > > referenced is > > + still valid. */ > > + if (str[1] == 'Q' || str[1] == 'R') > > + { > > + int o = GET_MODE_SIZE (GET_MODE (op)) - 1; > > + rtx tmp = adjust_address (op, QImode, o); > > + if (!s390_check_qrst_address (str[1], XEXP (tmp, 0), true)) > > + return 0; > > + } > > return s390_check_qrst_address (str[1], XEXP (op, 0), true); > > case 'B': > > /* Check for non-literal-pool variants of memory constraints. > > */ > > > > My reading of the constraints A[RQST] is that those are only used for > > operands > > with non-block mode. Thus, I didn't check for block mode. Maybe an > > assert > > would be worthwhile. > > This looks reasonable to me. I guess this deserves to be a separate > patch?
Yea I think so, too, since this fixes the constraints AR and AQ which is independent of this patch. I will prepare one shortly. > > In any case, with this the code looks good to me. Thanks for the review! Cheers, Stefan