On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus wrote:
> Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
> survive register allocation. This in turn leads to wrong register
> renaming. Keeping the current approach would mean we need two insns
> for
> *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something along the
> lines
>
> (define_insn "*tf_to_fprx2_0"
> [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand"
> "=f") 0)
> (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> UNSPEC_TF_TO_FPRX2_0))]
> "TARGET_VXE"
> "#")
>
> (define_insn "*tf_to_fprx2_0"
> [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> UNSPEC_TF_TO_FPRX2_0))]
> "TARGET_VXE"
> "vpdi\t%v0,%v1,%v0,1
> [(set_attr "op_type" "VRR")])
>
> and similar for *tf_to_fprx2_1. Note, pre register allocation
> operand 0
> has mode FPRX2 and afterwards DF once subregs have been eliminated.
>
> Since we always copy a whole vector register into a floating-point
> register pair, another way to fix this is to merge *tf_to_fprx2_0 and
> *tf_to_fprx2_1 into a single insn which means we don't have to use
> subregs at all. The downside of this is that the assembler template
> contains two instructions, now. The upside is that we don't have to
> come up with some artificial insn before RA which might be more
> readable/maintainable. That is implemented by this patch.
>
> In commit r11-4872-ge627cda5686592, the output operand specifier %V
> was
> introduced which is used in tf_to_fprx2 only, now. I didn't come up
> with its counterpart like %F for floating-point registers. Instead I
> printed the register pair in the output function directly. This
> spares
> us a new and "rare" format specifier for a single insn. I don't have
> a
> strong opinion which option to choose, however, we should either add
> %F
> in order to mimic the same behaviour as %V or getting rid of %V and
> inline the logic in the output function. I lean towards the latter.
> Any preferences?
> ---
> gcc/config/s390/s390.md | 2 +
> gcc/config/s390/vector.md | 66 +++++++++++---------
> --
> gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +++++++++
> 3 files changed, 60 insertions(+), 34 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c
[...]
> + char buf[64];
> + switch (which_alternative)
> + {
> + case 0:
> + if (REGNO (operands[0]) == REGNO (operands[1]))
> + return "vpdi\t%V0,%v1,%V0,5";
> + else
> + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> + case 1:
> + {
> + const char *reg_pair = reg_names[REGNO (operands[0]) + 1];
> + snprintf (buf, sizeof (buf), "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> reg_pair);
I wonder if there is a corner case where 8+ does not fit into short
displacement?
[...]