On Wed, Sep 11, 2024 at 08:57:23PM +0200, Ilya Leoshkevich wrote:
> On Wed, 2024-09-11 at 16:44 +0200, Stefan Schulze Frielinghaus wrote:
> > On Wed, Sep 11, 2024 at 01:59:48PM +0200, Ilya Leoshkevich wrote:
> > > On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus
> > > wrote:
> > > > On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote:
> > > > > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus
> > > > > wrote:
> > > > > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich
> > > > > > wrote:
> > > > > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze
> > > > > > > Frielinghaus
> > > > > > > wrote:
> > > > > > > > Currently subregs originating from *tf_to_fprx2_0 and
> > > > > > > > *tf_to_fprx2_1
> > > > > > > > survive register allocation. This in turn leads to wrong
> > > > > > > > register
> > > > > > > > renaming. Keeping the current approach would mean we
> > > > > > > > need
> > > > > > > > two
> > > > > > > > insns
> > > > > > > > for
> > > > > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.
> > > > > > > > Something
> > > > > > > > along
> > > > > > > > the
> > > > > > > > lines
> > > > > > > >
> > > > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > > > > [(set (subreg:DF (match_operand:FPRX2 0
> > > > > > > > "nonimmediate_operand"
> > > > > > > > "=f") 0)
> > > > > > > > (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > > > "v")]
> > > > > > > > UNSPEC_TF_TO_FPRX2_0))]
> > > > > > > > "TARGET_VXE"
> > > > > > > > "#")
> > > > > > > >
> > > > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > > > > [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > > > > > > > (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > > > "v")]
> > > > > > > > UNSPEC_TF_TO_FPRX2_0))]
> > > > > > > > "TARGET_VXE"
> > > > > > > > "vpdi\t%v0,%v1,%v0,1
> > > > > > > > [(set_attr "op_type" "VRR")])
> > > > > > > >
> > > > > > > > and similar for *tf_to_fprx2_1. Note, pre register
> > > > > > > > allocation
> > > > > > > > operand 0
> > > > > > > > has mode FPRX2 and afterwards DF once subregs have been
> > > > > > > > eliminated.
> > > > > > > >
> > > > > > > > Since we always copy a whole vector register into a
> > > > > > > > floating-
> > > > > > > > point
> > > > > > > > register pair, another way to fix this is to merge
> > > > > > > > *tf_to_fprx2_0
> > > > > > > > and
> > > > > > > > *tf_to_fprx2_1 into a single insn which means we don't
> > > > > > > > have
> > > > > > > > to
> > > > > > > > use
> > > > > > > > subregs at all. The downside of this is that the
> > > > > > > > assembler
> > > > > > > > template
> > > > > > > > contains two instructions, now. The upside is that we
> > > > > > > > don't
> > > > > > > > have
> > > > > > > > to
> > > > > > > > come up with some artificial insn before RA which might
> > > > > > > > be
> > > > > > > > more
> > > > > > > > readable/maintainable. That is implemented by this
> > > > > > > > patch.
> > > > > > > >
> > > > > > > > In commit r11-4872-ge627cda5686592, the output operand
> > > > > > > > specifier
> > > > > > > > %V
> > > > > > > > was
> > > > > > > > introduced which is used in tf_to_fprx2 only, now. I
> > > > > > > > didn't
> > > > > > > > come
> > > > > > > > up
> > > > > > > > with its counterpart like %F for floating-point
> > > > > > > > registers.
> > > > > > > > Instead I
> > > > > > > > printed the register pair in the output function
> > > > > > > > directly.
> > > > > > > > This
> > > > > > > > spares
> > > > > > > > us a new and "rare" format specifier for a single insn.
> > > > > > > > I
> > > > > > > > don't
> > > > > > > > have
> > > > > > > > a
> > > > > > > > strong opinion which option to choose, however, we should
> > > > > > > > either
> > > > > > > > add
> > > > > > > > %F
> > > > > > > > in order to mimic the same behaviour as %V or getting rid
> > > > > > > > of
> > > > > > > > %V
> > > > > > > > and
> > > > > > > > inline the logic in the output function. I lean towards
> > > > > > > > the
> > > > > > > > latter.
> > > > > > > > Any preferences?
> > > > > > > > ---
> > > > > > > > gcc/config/s390/s390.md | 2 +
> > > > > > > > gcc/config/s390/vector.md | 66
> > > > > > > > +++++++++++-
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > > --
> > > > > > > > gcc/testsuite/gcc.target/s390/pr115860-1.c | 26
> > > > > > > > +++++++++
> > > > > > > > 3 files changed, 60 insertions(+), 34 deletions(-)
> > > > > > > > create mode 100644
> > > > > > > > gcc/testsuite/gcc.target/s390/pr115860-
> > > > > > > > 1.c
> > > > > > >
> > > > > > > [...]
> > > > > > >
> > > > > > > > + char buf[64];
> > > > > > > > + switch (which_alternative)
> > > > > > > > + {
> > > > > > > > + case 0:
> > > > > > > > + if (REGNO (operands[0]) == REGNO (operands[1]))
> > > > > > > > + return "vpdi\t%V0,%v1,%V0,5";
> > > > > > > > + else
> > > > > > > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > > > > > > > + case 1:
> > > > > > > > + {
> > > > > > > > + const char *reg_pair = reg_names[REGNO
> > > > > > > > (operands[0])
> > > > > > > > +
> > > > > > > > 1];
> > > > > > > > + snprintf (buf, sizeof (buf),
> > > > > > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > > > > > > > reg_pair);
> > > > > > >
> > > > > > > I wonder if there is a corner case where 8+ does not fit
> > > > > > > into
> > > > > > > short
> > > > > > > displacement?
> > > > > >
> > > > > > That is covered by constraint AR, i.e., for short
> > > > > > displacement,
> > > > > > and
> > > > > > AT
> > > > > > for long displacement.
> > > > >
> > > > > Don't they cover only %1, and not 8+%1? Can't there be a
> > > > > situation
> > > > > where %1 barely fits and 8+%1 doesn't fit? A quick glance shows
> > > > > that
> > > > > the code doesn't leave any allowance for this:
> > > > >
> > > > > "AR"
> > > > > s390_mem_constraint("AR")
> > > > > s390_check_qrst_address('R')
> > > > > s390_short_displacement()
> > > > > INTVAL (disp) >= 0 && INTVAL (disp) < 4096
> > > >
> > > > Isn't this covered by
> > > >
> > > > int
> > > > s390_mem_constraint (const char *str, rtx op)
> > > > {
> > > > char c = str[0];
> > > >
> > > > switch (c)
> > > > {
> > > > case 'A':
> > > > /* Check for offsettable variants of memory constraints.
> > > > */
> > > > if (!MEM_P (op) || MEM_VOLATILE_P (op))
> > > > return 0;
> > > > if ((reload_completed || reload_in_progress)
> > > > ? !offsettable_memref_p (op) :
> > > > !offsettable_nonstrict_memref_p (op))
> > > > return 0;
> > > >
> > > > where
> > > >
> > > > /* Return true if OP is a memory reference whose address contains
> > > > no side effects and remains valid after the addition of a
> > > > positive
> > > > integer less than the size of the object being referenced.
> > > >
> > > > We assume that the original address is valid and do not check
> > > > it.
> > > >
> > > > This uses strict_memory_address_p as a subroutine, so
> > > > don't use it before reload. */
> > > >
> > > > bool
> > > > offsettable_memref_p (rtx op)
> > > > {
> > > > return ((MEM_P (op))
> > > > && offsettable_address_addr_space_p (1, GET_MODE (op),
> > > > XEXP
> > > > (op, 0),
> > > > MEM_ADDR_SPACE
> > > > (op)));
> > > > }
> > > >
> > > > guarantees that whatever we add to the address such that we stay
> > > > inside the
> > > > object, the address is valid. Or do I miss something?
> > >
> > > How does it know that address+addend should be not simply stay
> > > valid,
> > > but also fit into short displacement?
> >
> > Right, so offsettable_memref_p only ensures that any resulting
> > address is a
> > valid general address. So we have to manually check for short
> > displacement.
> > Maybe something along the lines:
> >
> > diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> > index 7aea776da2f..e61cda8352a 100644
> > --- a/gcc/config/s390/s390.cc
> > +++ b/gcc/config/s390/s390.cc
> > @@ -3714,6 +3714,18 @@ s390_mem_constraint (const char *str, rtx op)
> > if ((reload_completed || reload_in_progress)
> > ? !offsettable_memref_p (op) :
> > !offsettable_nonstrict_memref_p (op))
> > return 0;
> > + /* offsettable_memref_p ensures only that any positive offset
> > added to
> > + the address forms a valid general address. For Q and R
> > constraints we
> > + also have to verify that the resulting displacement after
> > adding any
> > + positive offset less than the size of the object being
> > referenced is
> > + still valid. */
> > + if (str[1] == 'Q' || str[1] == 'R')
> > + {
> > + int o = GET_MODE_SIZE (GET_MODE (op)) - 1;
> > + rtx tmp = adjust_address (op, QImode, o);
> > + if (!s390_check_qrst_address (str[1], XEXP (tmp, 0), true))
> > + return 0;
> > + }
> > return s390_check_qrst_address (str[1], XEXP (op, 0), true);
> > case 'B':
> > /* Check for non-literal-pool variants of memory constraints.
> > */
> >
> > My reading of the constraints A[RQST] is that those are only used for
> > operands
> > with non-block mode. Thus, I didn't check for block mode. Maybe an
> > assert
> > would be worthwhile.
>
> This looks reasonable to me. I guess this deserves to be a separate
> patch?
Yea I think so, too, since this fixes the constraints AR and AQ which is
independent of this patch. I will prepare one shortly.
>
> In any case, with this the code looks good to me.
Thanks for the review!
Cheers,
Stefan