On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote:
> On 03.11.20 22:45, Ilya Leoshkevich wrote:
> > On z14+, there are instructions for working with 128-bit floats
> > (long
> > doubles) in vector registers. It's beneficial to use them instead
> > of
> > instructions that operate on floating point register pairs, because
> > it
> > allows to store 4 times more data in registers at a time,
> > relieveing
> > register pressure. The performance of new instructions is almost
> > the
> > same.
> >
> > Implement by storing TFmode values in vector registers on
> > z14+. Since
> > not all operations are available with the new instructions, keep
> > the old
> > ones using the new FPRX2 mode, and convert between it and TFmode
> > when
> > necessary (this is called "forwarder" expanders below). Change the
> > existing TFmode expanders to call either new- or old-style ones
> > depending on whether we are on z14+ or older machines ("dispatcher"
> > expanders).
> >
> > gcc/ChangeLog:
> >
> > 2020-11-03 Ilya Leoshkevich <[email protected]>
> >
> > * config/s390/s390-modes.def (FPRX2): New mode.
> > * config/s390/s390-protos.h (s390_fma_allowed_p): New function.
> > * config/s390/s390.c (s390_fma_allowed_p): Likewise.
> > (s390_build_signbit_mask): Support 128-bit masks.
> > (print_operand): Support printing the second word of a TFmode
> > operand as vector register.
> > (constant_modes): Add FPRX2mode.
> > (s390_class_max_nregs): Return 1 for TFmode on z14+.
> > (s390_is_fpr128): New function.
> > (s390_is_vr128): Likewise.
> > (s390_can_change_mode_class): Use s390_is_fpr128 and
> > s390_is_vr128 in order to determine whether mode refers to a
> > FPR
> > pair or to a VR.
> > * config/s390/s390.h (EXPAND_MOVTF): New macro.
> > (EXPAND_TF): Likewise.
> > * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
> > alias.
> > (ALL): Add FPRX2.
> > (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
> > (FP): Likewise.
> > (FP_ANYTF): New mode iterator.
> > (BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
> > (TD_TF): Likewise.
> > (xde): Add FPRX2.
> > (nBFP): Likewise.
> > (nDFP): Likewise.
> > (DSF): Likewise.
> > (DFDI): Likewise.
> > (SFSI): Likewise.
> > (DF): Likewise.
> > (SF): Likewise.
> > (fT0): Likewise.
> > (bt): Likewise.
> > (_d): Likewise.
> > (HALF_TMODE): Likewise.
> > (tf_fpr): New mode_attr.
> > (type): New mode_attr.
> > (*cmp<mode>_ccz_0): Use type instead of mode with fsimp.
> > (*cmp<mode>_ccs_0_fastmath): Likewise.
> > (*cmptf_ccs): New pattern for wfcxb.
> > (*cmptf_ccsfps): New pattern for wfkxb.
> > (mov<mode>): Rename to mov<mode><tf_fpr>.
> > (signbit<mode>2): Rename to signbit<mode>2<tf_fpr>.
> > (isinf<mode>2): Renamed to isinf<mode>2<tf_fpr>.
> > (*TDC_insn_<mode>): Use type instead of mode with fsimp.
> > (fixuns_trunc<FP:mode><GPR:mode>2): Rename to
> > fixuns_trunc<FP:mode><GPR:mode>2<FP:tf_fpr>.
> > (fix_trunctf<mode>2): Rename to fix_trunctf<mode>2_fpr.
> > (floatdi<mode>2): Rename to floatdi<mode>2<tf_fpr>, use type
> > instead of mode with itof.
> > (floatsi<mode>2): Rename to floatsi<mode>2<tf_fpr>, use type
> > instead of mode with itof.
> > (*floatuns<GPR:mode><FP:mode>2): Use type instead of mode for
> > itof.
> > (floatuns<GPR:mode><FP:mode>2): Rename to
> > floatuns<GPR:mode><FP:mode>2<tf_fpr>.
> > (trunctf<mode>2): Rename to trunctf<mode>2_fpr, use type
> > instead
> > of mode with fsimp.
> > (extend<DSF:mode><BFP:mode>2): Rename to
> > extend<DSF:mode><BFP:mode>2<BFP:tf_fpr>.
> > (<FPINT:fpint_name><BFP:mode>2): Rename to
> > <FPINT:fpint_name><BFP:mode>2<BFP:tf_fpr>, use type instead of
> > mode with fsimp.
> > (rint<BFP:mode>2): Rename to rint<BFP:mode>2<BFP:tf_fpr>, use
> > type instead of mode with fsimp.
> > (<FPINT:fpint_name><DFP:mode>2): Use type instead of mode for
> > fsimp.
> > (rint<DFP:mode>2): Likewise.
> > (trunc<BFP:mode><DFP_ALL:mode>2): Rename to
> > trunc<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
> > (trunc<DFP_ALL:mode><BFP:mode>2): Rename to
> > trunc<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
> > (extend<BFP:mode><DFP_ALL:mode>2): Rename to
> > extend<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
> > (extend<DFP_ALL:mode><BFP:mode>2): Rename to
> > extend<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
> > (add<mode>3): Rename to add<mode>3<tf_fpr>, use type instead of
> > mode with fsimp.
> > (*add<mode>3_cc): Use type instead of mode with fsimp.
> > (*add<mode>3_cconly): Likewise.
> > (sub<mode>3): Rename to sub<mode>3<tf_fpr>, use type instead of
> > mode with fsimp.
> > (*sub<mode>3_cc): Use type instead of mode with fsimp.
> > (*sub<mode>3_cconly): Likewise.
> > (mul<mode>3): Rename to mul<mode>3<tf_fpr>, use type instead of
> > mode with fsimp.
> > (fma<mode>4): Restrict using s390_fma_allowed_p.
> > (fms<mode>4): Restrict using s390_fma_allowed_p.
> > (div<mode>3): Rename to div<mode>3<tf_fpr>, use type instead of
> > mode with fdiv.
> > (neg<mode>2): Rename to neg<mode>2<tf_fpr>.
> > (*neg<mode>2_cc): Use type instead of mode with fsimp.
> > (*neg<mode>2_cconly): Likewise.
> > (*neg<mode>2_nocc): Likewise.
> > (*neg<mode>2): Likeiwse.
> > (abs<mode>2): Rename to abs<mode>2<tf_fpr>, use type instead of
> > mode with fdiv.
> > (*abs<mode>2_cc): Use type instead of mode with fsimp.
> > (*abs<mode>2_cconly): Likewise.
> > (*abs<mode>2_nocc): Likewise.
> > (*abs<mode>2): Likewise.
> > (*negabs<mode>2_cc): Likewise.
> > (*negabs<mode>2_cconly): Likewise.
> > (*negabs<mode>2_nocc): Likewise.
> > (*negabs<mode>2): Likewise.
> > (sqrt<mode>2): Rename to sqrt<mode>2<tf_fpr>, use type instead
> > of mode with fsqrt.
> > (cbranch<mode>4): Use FP_ANYTF instead of FP.
> > (copysign<mode>3): Rename to copysign<mode>3<tf_fpr>, use type
> > instead of mode with fsimp.
> > * config/s390/s390.opt (flag_vx_long_double_fma): New
> > undocumented option.
> > * config/s390/vector.md (V_HW): Add TF for z14+.
> > (V_HW2): Likewise.
> > (VFT): Likewise.
> > (VF_HW): Likewise.
> > (V_128): Likewise.
> > (tf_vr): New mode_attr.
> > (tointvec): Add TF.
> > (mov<mode>): Rename to mov<mode><tf_vr>.
> > (movetf): New dispatcher.
> > (*vec_tf_to_v1tf): Rename to *vec_tf_to_v1tf_fpr, restrict to
> > z13-.
> > (*vec_tf_to_v1tf_vr): New pattern for z14+.
> > (*fprx2_to_tf): Likewise.
> > (*mov_tf_to_fprx2_0): Likewise.
> > (*mov_tf_to_fprx2_1): Likewise.
> > (add<mode>3): Rename to add<mode>3<tf_vr>.
> > (addtf3): New dispatcher.
> > (sub<mode>3): Rename to sub<mode>3<tf_vr>.
> > (subtf3): New dispatcher.
> > (mul<mode>3): Rename to mul<mode>3<tf_vr>.
> > (multf3): New dispatcher.
> > (div<mode>3): Rename to div<mode>3<tf_vr>.
> > (divtf3): New dispatcher.
> > (sqrt<mode>2): Rename to sqrt<mode>2<tf_vr>.
> > (sqrttf2): New dispatcher.
> > (fma<mode>4): Restrict using s390_fma_allowed_p.
> > (fms<mode>4): Likewise.
> > (neg_fma<mode>4): Likewise.
> > (neg_fms<mode>4): Likewise.
> > (neg<mode>2): Rename to neg<mode>2<tf_vr>.
> > (negtf2): New dispatcher.
> > (abs<mode>2): Rename to abs<mode>2<tf_vr>.
> > (abstf2): New dispatcher.
> > (float<mode>tf2_vr): New forwarder.
> > (float<mode>tf2): New dispatcher.
> > (floatuns<mode>tf2_vr): New forwarder.
> > (floatuns<mode>tf2): New dispatcher.
> > (fix_trunctf<mode>2_vr): New forwarder.
> > (fix_trunctf<mode>2): New dispatcher.
> > (fixuns_trunctf<mode>2_vr): New forwarder.
> > (fixuns_trunctf<mode>2): New dispatcher.
> > (<FPINT:fpint_name><VF_HW:mode>2<VF_HW:tf_vr>): New pattern.
> > (<FPINT:fpint_name>tf2): New forwarder.
> > (rint<mode>2<tf_vr>): New pattern.
> > (rinttf2): New forwarder.
> > (*trunctfdf2_vr): New pattern.
> > (trunctfdf2_vr): New forwarder.
> > (trunctfdf2): New dispatcher.
> > (trunctfsf2_vr): New forwarder.
> > (trunctfsf2): New dispatcher.
> > (extenddftf2_vr): New pattern.
> > (extenddftf2): New dispatcher.
> > (extendsftf2_vr): New forwarder.
> > (extendsftf2): New dispatcher.
> > (signbittf2_vr): New forwarder.
> > (signbittf2): New dispatchers.
> > (isinftf2_vr): New forwarder.
> > (isinftf2): New dispatcher.
> > * config/s390/vx-builtins.md (*vftci<mode>_cconly): Use VF_HW
> > instead of VECF_HW, add missing constraint, add vw support.
> > (vftci<mode>_intcconly): Use VF_HW instead of VECF_HW.
> > (*vftci<mode>): Rename to vftci<mode>, use VF_HW instead of
> > VECF_HW, and vw support.
> > (vftci<mode>_intcc): Use VF_HW instead of VECF_HW.
>
> ...
>
> > +; VX: TFmode in VR: use wfcxb
> > +(define_insn "*cmptf_ccs"
> > + [(set (reg CC_REGNUM)
> > + (compare (match_operand:TF 0 "register_operand" "v")
> > + (match_operand:TF 1 "general_operand" "v")))]
>
> Is this really benefitial to allow general_operands here? Everything
> except registers need to be reloaded anyway. To my experience it is
> helpful to emit the extra moves as early as possible to let the other
> optimizers work with them.
The rtxes recognized by this pattern are initially generated by the
generic cbranch expander, which allows general_operands and thus
doesn't immediately reload. If we don't allow general_operands here,
rtxes generated by cbranch will be unrecognizable.