Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when possible

Ilya Leoshkevich via Gcc-patches Wed, 04 Nov 2020 14:13:21 -0800

On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote:
> On 03.11.20 22:45, Ilya Leoshkevich wrote:
> > On z14+, there are instructions for working with 128-bit floats
> > (long
> > doubles) in vector registers.  It's beneficial to use them instead
> > of
> > instructions that operate on floating point register pairs, because
> > it
> > allows to store 4 times more data in registers at a time,
> > relieveing
> > register pressure.  The performance of new instructions is almost
> > the
> > same.
> > 
> > Implement by storing TFmode values in vector registers on
> > z14+.  Since
> > not all operations are available with the new instructions, keep
> > the old
> > ones using the new FPRX2 mode, and convert between it and TFmode
> > when
> > necessary (this is called "forwarder" expanders below).  Change the
> > existing TFmode expanders to call either new- or old-style ones
> > depending on whether we are on z14+ or older machines ("dispatcher"
> > expanders).
> > 
> > gcc/ChangeLog:
> > 
> > 2020-11-03  Ilya Leoshkevich  <[email protected]>
> > 
> >     * config/s390/s390-modes.def (FPRX2): New mode.
> >     * config/s390/s390-protos.h (s390_fma_allowed_p): New function.
> >     * config/s390/s390.c (s390_fma_allowed_p): Likewise.
> >     (s390_build_signbit_mask): Support 128-bit masks.
> >     (print_operand): Support printing the second word of a TFmode
> >     operand as vector register.
> >     (constant_modes): Add FPRX2mode.
> >     (s390_class_max_nregs): Return 1 for TFmode on z14+.
> >     (s390_is_fpr128): New function.
> >     (s390_is_vr128): Likewise.
> >     (s390_can_change_mode_class): Use s390_is_fpr128 and
> >     s390_is_vr128 in order to determine whether mode refers to a
> > FPR
> >     pair or to a VR.
> >     * config/s390/s390.h (EXPAND_MOVTF): New macro.
> >     (EXPAND_TF): Likewise.
> >     * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
> >     alias.
> >     (ALL): Add FPRX2.
> >     (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
> >     (FP): Likewise.
> >     (FP_ANYTF): New mode iterator.
> >     (BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
> >     (TD_TF): Likewise.
> >     (xde): Add FPRX2.
> >     (nBFP): Likewise.
> >     (nDFP): Likewise.
> >     (DSF): Likewise.
> >     (DFDI): Likewise.
> >     (SFSI): Likewise.
> >     (DF): Likewise.
> >     (SF): Likewise.
> >     (fT0): Likewise.
> >     (bt): Likewise.
> >     (_d): Likewise.
> >     (HALF_TMODE): Likewise.
> >     (tf_fpr): New mode_attr.
> >     (type): New mode_attr.
> >     (*cmp<mode>_ccz_0): Use type instead of mode with fsimp.
> >     (*cmp<mode>_ccs_0_fastmath): Likewise.
> >     (*cmptf_ccs): New pattern for wfcxb.
> >     (*cmptf_ccsfps): New pattern for wfkxb.
> >     (mov<mode>): Rename to mov<mode><tf_fpr>.
> >     (signbit<mode>2): Rename to signbit<mode>2<tf_fpr>.
> >     (isinf<mode>2): Renamed to isinf<mode>2<tf_fpr>.
> >     (*TDC_insn_<mode>): Use type instead of mode with fsimp.
> >     (fixuns_trunc<FP:mode><GPR:mode>2): Rename to
> >     fixuns_trunc<FP:mode><GPR:mode>2<FP:tf_fpr>.
> >     (fix_trunctf<mode>2): Rename to fix_trunctf<mode>2_fpr.
> >     (floatdi<mode>2): Rename to floatdi<mode>2<tf_fpr>, use type
> >     instead of mode with itof.
> >     (floatsi<mode>2): Rename to floatsi<mode>2<tf_fpr>, use type
> >     instead of mode with itof.
> >     (*floatuns<GPR:mode><FP:mode>2): Use type instead of mode for
> >     itof.
> >     (floatuns<GPR:mode><FP:mode>2): Rename to
> >     floatuns<GPR:mode><FP:mode>2<tf_fpr>.
> >     (trunctf<mode>2): Rename to trunctf<mode>2_fpr, use type
> > instead
> >     of mode with fsimp.
> >     (extend<DSF:mode><BFP:mode>2): Rename to
> >     extend<DSF:mode><BFP:mode>2<BFP:tf_fpr>.
> >     (<FPINT:fpint_name><BFP:mode>2): Rename to
> >     <FPINT:fpint_name><BFP:mode>2<BFP:tf_fpr>, use type instead of
> >     mode with fsimp.
> >     (rint<BFP:mode>2): Rename to rint<BFP:mode>2<BFP:tf_fpr>, use
> >     type instead of mode with fsimp.
> >     (<FPINT:fpint_name><DFP:mode>2): Use type instead of mode for
> >     fsimp.
> >     (rint<DFP:mode>2): Likewise.
> >     (trunc<BFP:mode><DFP_ALL:mode>2): Rename to
> >     trunc<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
> >     (trunc<DFP_ALL:mode><BFP:mode>2): Rename to
> >     trunc<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
> >     (extend<BFP:mode><DFP_ALL:mode>2): Rename to
> >     extend<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
> >     (extend<DFP_ALL:mode><BFP:mode>2): Rename to
> >     extend<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
> >     (add<mode>3): Rename to add<mode>3<tf_fpr>, use type instead of
> >     mode with fsimp.
> >     (*add<mode>3_cc): Use type instead of mode with fsimp.
> >     (*add<mode>3_cconly): Likewise.
> >     (sub<mode>3): Rename to sub<mode>3<tf_fpr>, use type instead of
> >     mode with fsimp.
> >     (*sub<mode>3_cc): Use type instead of mode with fsimp.
> >     (*sub<mode>3_cconly): Likewise.
> >     (mul<mode>3): Rename to mul<mode>3<tf_fpr>, use type instead of
> >     mode with fsimp.
> >     (fma<mode>4): Restrict using s390_fma_allowed_p.
> >     (fms<mode>4): Restrict using s390_fma_allowed_p.
> >     (div<mode>3): Rename to div<mode>3<tf_fpr>, use type instead of
> >     mode with fdiv.
> >     (neg<mode>2): Rename to neg<mode>2<tf_fpr>.
> >     (*neg<mode>2_cc): Use type instead of mode with fsimp.
> >     (*neg<mode>2_cconly): Likewise.
> >     (*neg<mode>2_nocc): Likewise.
> >     (*neg<mode>2): Likeiwse.
> >     (abs<mode>2): Rename to abs<mode>2<tf_fpr>, use type instead of
> >     mode with fdiv.
> >     (*abs<mode>2_cc): Use type instead of mode with fsimp.
> >     (*abs<mode>2_cconly): Likewise.
> >     (*abs<mode>2_nocc): Likewise.
> >     (*abs<mode>2): Likewise.
> >     (*negabs<mode>2_cc): Likewise.
> >     (*negabs<mode>2_cconly): Likewise.
> >     (*negabs<mode>2_nocc): Likewise.
> >     (*negabs<mode>2): Likewise.
> >     (sqrt<mode>2): Rename to sqrt<mode>2<tf_fpr>, use type instead
> >     of mode with fsqrt.
> >     (cbranch<mode>4): Use FP_ANYTF instead of FP.
> >     (copysign<mode>3): Rename to copysign<mode>3<tf_fpr>, use type
> >     instead of mode with fsimp.
> >     * config/s390/s390.opt (flag_vx_long_double_fma): New
> >     undocumented option.
> >     * config/s390/vector.md (V_HW): Add TF for z14+.
> >     (V_HW2): Likewise.
> >     (VFT): Likewise.
> >     (VF_HW): Likewise.
> >     (V_128): Likewise.
> >     (tf_vr): New mode_attr.
> >     (tointvec): Add TF.
> >     (mov<mode>): Rename to mov<mode><tf_vr>.
> >     (movetf): New dispatcher.
> >     (*vec_tf_to_v1tf): Rename to *vec_tf_to_v1tf_fpr, restrict to
> >     z13-.
> >     (*vec_tf_to_v1tf_vr): New pattern for z14+.
> >     (*fprx2_to_tf): Likewise.
> >     (*mov_tf_to_fprx2_0): Likewise.
> >     (*mov_tf_to_fprx2_1): Likewise.
> >     (add<mode>3): Rename to add<mode>3<tf_vr>.
> >     (addtf3): New dispatcher.
> >     (sub<mode>3): Rename to sub<mode>3<tf_vr>.
> >     (subtf3): New dispatcher.
> >     (mul<mode>3): Rename to mul<mode>3<tf_vr>.
> >     (multf3): New dispatcher.
> >     (div<mode>3): Rename to div<mode>3<tf_vr>.
> >     (divtf3): New dispatcher.
> >     (sqrt<mode>2): Rename to sqrt<mode>2<tf_vr>.
> >     (sqrttf2): New dispatcher.
> >     (fma<mode>4): Restrict using s390_fma_allowed_p.
> >     (fms<mode>4): Likewise.
> >     (neg_fma<mode>4): Likewise.
> >     (neg_fms<mode>4): Likewise.
> >     (neg<mode>2): Rename to neg<mode>2<tf_vr>.
> >     (negtf2): New dispatcher.
> >     (abs<mode>2): Rename to abs<mode>2<tf_vr>.
> >     (abstf2): New dispatcher.
> >     (float<mode>tf2_vr): New forwarder.
> >     (float<mode>tf2): New dispatcher.
> >     (floatuns<mode>tf2_vr): New forwarder.
> >     (floatuns<mode>tf2): New dispatcher.
> >     (fix_trunctf<mode>2_vr): New forwarder.
> >     (fix_trunctf<mode>2): New dispatcher.
> >     (fixuns_trunctf<mode>2_vr): New forwarder.
> >     (fixuns_trunctf<mode>2): New dispatcher.
> >     (<FPINT:fpint_name><VF_HW:mode>2<VF_HW:tf_vr>): New pattern.
> >     (<FPINT:fpint_name>tf2): New forwarder.
> >     (rint<mode>2<tf_vr>): New pattern.
> >     (rinttf2): New forwarder.
> >     (*trunctfdf2_vr): New pattern.
> >     (trunctfdf2_vr): New forwarder.
> >     (trunctfdf2): New dispatcher.
> >     (trunctfsf2_vr): New forwarder.
> >     (trunctfsf2): New dispatcher.
> >     (extenddftf2_vr): New pattern.
> >     (extenddftf2): New dispatcher.
> >     (extendsftf2_vr): New forwarder.
> >     (extendsftf2): New dispatcher.
> >     (signbittf2_vr): New forwarder.
> >     (signbittf2): New dispatchers.
> >     (isinftf2_vr): New forwarder.
> >     (isinftf2): New dispatcher.
> >     * config/s390/vx-builtins.md (*vftci<mode>_cconly): Use VF_HW
> >     instead of VECF_HW, add missing constraint, add vw support.
> >     (vftci<mode>_intcconly): Use VF_HW instead of VECF_HW.
> >     (*vftci<mode>): Rename to vftci<mode>, use VF_HW instead of
> >     VECF_HW, and vw support.
> >     (vftci<mode>_intcc): Use VF_HW instead of VECF_HW.
> 
> ...
> 
> > +; VX: TFmode in VR: use wfcxb
> > +(define_insn "*cmptf_ccs"
> > +  [(set (reg CC_REGNUM)
> > +   (compare (match_operand:TF 0 "register_operand" "v")
> > +            (match_operand:TF 1 "general_operand"  "v")))]
> 
> Is this really benefitial to allow general_operands here? Everything
> except registers need to be reloaded anyway.  To my experience it is
> helpful to emit the extra moves as early as possible to let the other
> optimizers work with them.


The rtxes recognized by this pattern are initially generated by the
generic cbranch expander, which allows general_operands and thus
doesn't immediately reload.  If we don't allow general_operands here,
rtxes generated by cbranch will be unrecognizable.

Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when possible

Reply via email to