On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus wrote: > On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote: > > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus > > wrote: > > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote: > > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus > > > > wrote: > > > > > Currently subregs originating from *tf_to_fprx2_0 and > > > > > *tf_to_fprx2_1 > > > > > survive register allocation. This in turn leads to wrong > > > > > register > > > > > renaming. Keeping the current approach would mean we need > > > > > two > > > > > insns > > > > > for > > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something > > > > > along > > > > > the > > > > > lines > > > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > > [(set (subreg:DF (match_operand:FPRX2 0 > > > > > "nonimmediate_operand" > > > > > "=f") 0) > > > > > (unspec:DF [(match_operand:TF 1 "general_operand" > > > > > "v")] > > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > > "TARGET_VXE" > > > > > "#") > > > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > > [(set (match_operand:DF 0 "nonimmediate_operand" "=f") > > > > > (unspec:DF [(match_operand:TF 1 "general_operand" > > > > > "v")] > > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > > "TARGET_VXE" > > > > > "vpdi\t%v0,%v1,%v0,1 > > > > > [(set_attr "op_type" "VRR")]) > > > > > > > > > > and similar for *tf_to_fprx2_1. Note, pre register > > > > > allocation > > > > > operand 0 > > > > > has mode FPRX2 and afterwards DF once subregs have been > > > > > eliminated. > > > > > > > > > > Since we always copy a whole vector register into a floating- > > > > > point > > > > > register pair, another way to fix this is to merge > > > > > *tf_to_fprx2_0 > > > > > and > > > > > *tf_to_fprx2_1 into a single insn which means we don't have > > > > > to > > > > > use > > > > > subregs at all. The downside of this is that the assembler > > > > > template > > > > > contains two instructions, now. The upside is that we don't > > > > > have > > > > > to > > > > > come up with some artificial insn before RA which might be > > > > > more > > > > > readable/maintainable. That is implemented by this patch. > > > > > > > > > > In commit r11-4872-ge627cda5686592, the output operand > > > > > specifier > > > > > %V > > > > > was > > > > > introduced which is used in tf_to_fprx2 only, now. I didn't > > > > > come > > > > > up > > > > > with its counterpart like %F for floating-point registers. > > > > > Instead I > > > > > printed the register pair in the output function directly. > > > > > This > > > > > spares > > > > > us a new and "rare" format specifier for a single insn. I > > > > > don't > > > > > have > > > > > a > > > > > strong opinion which option to choose, however, we should > > > > > either > > > > > add > > > > > %F > > > > > in order to mimic the same behaviour as %V or getting rid of > > > > > %V > > > > > and > > > > > inline the logic in the output function. I lean towards the > > > > > latter. > > > > > Any preferences? > > > > > --- > > > > > gcc/config/s390/s390.md | 2 + > > > > > gcc/config/s390/vector.md | 66 +++++++++++- > > > > > ---- > > > > > ---- > > > > > -- > > > > > gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +++++++++ > > > > > 3 files changed, 60 insertions(+), 34 deletions(-) > > > > > create mode 100644 gcc/testsuite/gcc.target/s390/pr115860- > > > > > 1.c > > > > > > > > [...] > > > > > > > > > + char buf[64]; > > > > > + switch (which_alternative) > > > > > + { > > > > > + case 0: > > > > > + if (REGNO (operands[0]) == REGNO (operands[1])) > > > > > + return "vpdi\t%V0,%v1,%V0,5"; > > > > > + else > > > > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5"; > > > > > + case 1: > > > > > + { > > > > > + const char *reg_pair = reg_names[REGNO (operands[0]) > > > > > + > > > > > 1]; > > > > > + snprintf (buf, sizeof (buf), > > > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1", > > > > > reg_pair); > > > > > > > > I wonder if there is a corner case where 8+ does not fit into > > > > short > > > > displacement? > > > > > > That is covered by constraint AR, i.e., for short displacement, > > > and > > > AT > > > for long displacement. > > > > Don't they cover only %1, and not 8+%1? Can't there be a situation > > where %1 barely fits and 8+%1 doesn't fit? A quick glance shows > > that > > the code doesn't leave any allowance for this: > > > > "AR" > > s390_mem_constraint("AR") > > s390_check_qrst_address('R') > > s390_short_displacement() > > INTVAL (disp) >= 0 && INTVAL (disp) < 4096 > > Isn't this covered by > > int > s390_mem_constraint (const char *str, rtx op) > { > char c = str[0]; > > switch (c) > { > case 'A': > /* Check for offsettable variants of memory constraints. */ > if (!MEM_P (op) || MEM_VOLATILE_P (op)) > return 0; > if ((reload_completed || reload_in_progress) > ? !offsettable_memref_p (op) : > !offsettable_nonstrict_memref_p (op)) > return 0; > > where > > /* Return true if OP is a memory reference whose address contains > no side effects and remains valid after the addition of a positive > integer less than the size of the object being referenced. > > We assume that the original address is valid and do not check it. > > This uses strict_memory_address_p as a subroutine, so > don't use it before reload. */ > > bool > offsettable_memref_p (rtx op) > { > return ((MEM_P (op)) > && offsettable_address_addr_space_p (1, GET_MODE (op), XEXP > (op, 0), > MEM_ADDR_SPACE (op))); > } > > guarantees that whatever we add to the address such that we stay > inside the > object, the address is valid. Or do I miss something?
How does it know that address+addend should be not simply stay valid, but also fit into short displacement?