On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote: > On 03.11.20 22:45, Ilya Leoshkevich wrote: > > On z14+, there are instructions for working with 128-bit floats > > (long > > doubles) in vector registers. It's beneficial to use them instead > > of > > instructions that operate on floating point register pairs, because > > it > > allows to store 4 times more data in registers at a time, > > relieveing > > register pressure. The performance of new instructions is almost > > the > > same. > > > > Implement by storing TFmode values in vector registers on > > z14+. Since > > not all operations are available with the new instructions, keep > > the old > > ones using the new FPRX2 mode, and convert between it and TFmode > > when > > necessary (this is called "forwarder" expanders below). Change the > > existing TFmode expanders to call either new- or old-style ones > > depending on whether we are on z14+ or older machines ("dispatcher" > > expanders). > > > > gcc/ChangeLog: > > > > 2020-11-03 Ilya Leoshkevich <i...@linux.ibm.com> > > > > * config/s390/s390-modes.def (FPRX2): New mode. > > * config/s390/s390-protos.h (s390_fma_allowed_p): New function. > > * config/s390/s390.c (s390_fma_allowed_p): Likewise. > > (s390_build_signbit_mask): Support 128-bit masks. > > (print_operand): Support printing the second word of a TFmode > > operand as vector register. > > (constant_modes): Add FPRX2mode. > > (s390_class_max_nregs): Return 1 for TFmode on z14+. > > (s390_is_fpr128): New function. > > (s390_is_vr128): Likewise. > > (s390_can_change_mode_class): Use s390_is_fpr128 and > > s390_is_vr128 in order to determine whether mode refers to a > > FPR > > pair or to a VR. > > * config/s390/s390.h (EXPAND_MOVTF): New macro. > > (EXPAND_TF): Likewise. > > * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF > > alias. > > (ALL): Add FPRX2. > > (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-. > > (FP): Likewise. > > (FP_ANYTF): New mode iterator. > > (BFP): Add FPRX2 for z14+, restrict TFmode to z13-. > > (TD_TF): Likewise. > > (xde): Add FPRX2. > > (nBFP): Likewise. > > (nDFP): Likewise. > > (DSF): Likewise. > > (DFDI): Likewise. > > (SFSI): Likewise. > > (DF): Likewise. > > (SF): Likewise. > > (fT0): Likewise. > > (bt): Likewise. > > (_d): Likewise. > > (HALF_TMODE): Likewise. > > (tf_fpr): New mode_attr. > > (type): New mode_attr. > > (*cmp<mode>_ccz_0): Use type instead of mode with fsimp. > > (*cmp<mode>_ccs_0_fastmath): Likewise. > > (*cmptf_ccs): New pattern for wfcxb. > > (*cmptf_ccsfps): New pattern for wfkxb. > > (mov<mode>): Rename to mov<mode><tf_fpr>. > > (signbit<mode>2): Rename to signbit<mode>2<tf_fpr>. > > (isinf<mode>2): Renamed to isinf<mode>2<tf_fpr>. > > (*TDC_insn_<mode>): Use type instead of mode with fsimp. > > (fixuns_trunc<FP:mode><GPR:mode>2): Rename to > > fixuns_trunc<FP:mode><GPR:mode>2<FP:tf_fpr>. > > (fix_trunctf<mode>2): Rename to fix_trunctf<mode>2_fpr. > > (floatdi<mode>2): Rename to floatdi<mode>2<tf_fpr>, use type > > instead of mode with itof. > > (floatsi<mode>2): Rename to floatsi<mode>2<tf_fpr>, use type > > instead of mode with itof. > > (*floatuns<GPR:mode><FP:mode>2): Use type instead of mode for > > itof. > > (floatuns<GPR:mode><FP:mode>2): Rename to > > floatuns<GPR:mode><FP:mode>2<tf_fpr>. > > (trunctf<mode>2): Rename to trunctf<mode>2_fpr, use type > > instead > > of mode with fsimp. > > (extend<DSF:mode><BFP:mode>2): Rename to > > extend<DSF:mode><BFP:mode>2<BFP:tf_fpr>. > > (<FPINT:fpint_name><BFP:mode>2): Rename to > > <FPINT:fpint_name><BFP:mode>2<BFP:tf_fpr>, use type instead of > > mode with fsimp. > > (rint<BFP:mode>2): Rename to rint<BFP:mode>2<BFP:tf_fpr>, use > > type instead of mode with fsimp. > > (<FPINT:fpint_name><DFP:mode>2): Use type instead of mode for > > fsimp. > > (rint<DFP:mode>2): Likewise. > > (trunc<BFP:mode><DFP_ALL:mode>2): Rename to > > trunc<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>. > > (trunc<DFP_ALL:mode><BFP:mode>2): Rename to > > trunc<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>. > > (extend<BFP:mode><DFP_ALL:mode>2): Rename to > > extend<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>. > > (extend<DFP_ALL:mode><BFP:mode>2): Rename to > > extend<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>. > > (add<mode>3): Rename to add<mode>3<tf_fpr>, use type instead of > > mode with fsimp. > > (*add<mode>3_cc): Use type instead of mode with fsimp. > > (*add<mode>3_cconly): Likewise. > > (sub<mode>3): Rename to sub<mode>3<tf_fpr>, use type instead of > > mode with fsimp. > > (*sub<mode>3_cc): Use type instead of mode with fsimp. > > (*sub<mode>3_cconly): Likewise. > > (mul<mode>3): Rename to mul<mode>3<tf_fpr>, use type instead of > > mode with fsimp. > > (fma<mode>4): Restrict using s390_fma_allowed_p. > > (fms<mode>4): Restrict using s390_fma_allowed_p. > > (div<mode>3): Rename to div<mode>3<tf_fpr>, use type instead of > > mode with fdiv. > > (neg<mode>2): Rename to neg<mode>2<tf_fpr>. > > (*neg<mode>2_cc): Use type instead of mode with fsimp. > > (*neg<mode>2_cconly): Likewise. > > (*neg<mode>2_nocc): Likewise. > > (*neg<mode>2): Likeiwse. > > (abs<mode>2): Rename to abs<mode>2<tf_fpr>, use type instead of > > mode with fdiv. > > (*abs<mode>2_cc): Use type instead of mode with fsimp. > > (*abs<mode>2_cconly): Likewise. > > (*abs<mode>2_nocc): Likewise. > > (*abs<mode>2): Likewise. > > (*negabs<mode>2_cc): Likewise. > > (*negabs<mode>2_cconly): Likewise. > > (*negabs<mode>2_nocc): Likewise. > > (*negabs<mode>2): Likewise. > > (sqrt<mode>2): Rename to sqrt<mode>2<tf_fpr>, use type instead > > of mode with fsqrt. > > (cbranch<mode>4): Use FP_ANYTF instead of FP. > > (copysign<mode>3): Rename to copysign<mode>3<tf_fpr>, use type > > instead of mode with fsimp. > > * config/s390/s390.opt (flag_vx_long_double_fma): New > > undocumented option. > > * config/s390/vector.md (V_HW): Add TF for z14+. > > (V_HW2): Likewise. > > (VFT): Likewise. > > (VF_HW): Likewise. > > (V_128): Likewise. > > (tf_vr): New mode_attr. > > (tointvec): Add TF. > > (mov<mode>): Rename to mov<mode><tf_vr>. > > (movetf): New dispatcher. > > (*vec_tf_to_v1tf): Rename to *vec_tf_to_v1tf_fpr, restrict to > > z13-. > > (*vec_tf_to_v1tf_vr): New pattern for z14+. > > (*fprx2_to_tf): Likewise. > > (*mov_tf_to_fprx2_0): Likewise. > > (*mov_tf_to_fprx2_1): Likewise. > > (add<mode>3): Rename to add<mode>3<tf_vr>. > > (addtf3): New dispatcher. > > (sub<mode>3): Rename to sub<mode>3<tf_vr>. > > (subtf3): New dispatcher. > > (mul<mode>3): Rename to mul<mode>3<tf_vr>. > > (multf3): New dispatcher. > > (div<mode>3): Rename to div<mode>3<tf_vr>. > > (divtf3): New dispatcher. > > (sqrt<mode>2): Rename to sqrt<mode>2<tf_vr>. > > (sqrttf2): New dispatcher. > > (fma<mode>4): Restrict using s390_fma_allowed_p. > > (fms<mode>4): Likewise. > > (neg_fma<mode>4): Likewise. > > (neg_fms<mode>4): Likewise. > > (neg<mode>2): Rename to neg<mode>2<tf_vr>. > > (negtf2): New dispatcher. > > (abs<mode>2): Rename to abs<mode>2<tf_vr>. > > (abstf2): New dispatcher. > > (float<mode>tf2_vr): New forwarder. > > (float<mode>tf2): New dispatcher. > > (floatuns<mode>tf2_vr): New forwarder. > > (floatuns<mode>tf2): New dispatcher. > > (fix_trunctf<mode>2_vr): New forwarder. > > (fix_trunctf<mode>2): New dispatcher. > > (fixuns_trunctf<mode>2_vr): New forwarder. > > (fixuns_trunctf<mode>2): New dispatcher. > > (<FPINT:fpint_name><VF_HW:mode>2<VF_HW:tf_vr>): New pattern. > > (<FPINT:fpint_name>tf2): New forwarder. > > (rint<mode>2<tf_vr>): New pattern. > > (rinttf2): New forwarder. > > (*trunctfdf2_vr): New pattern. > > (trunctfdf2_vr): New forwarder. > > (trunctfdf2): New dispatcher. > > (trunctfsf2_vr): New forwarder. > > (trunctfsf2): New dispatcher. > > (extenddftf2_vr): New pattern. > > (extenddftf2): New dispatcher. > > (extendsftf2_vr): New forwarder. > > (extendsftf2): New dispatcher. > > (signbittf2_vr): New forwarder. > > (signbittf2): New dispatchers. > > (isinftf2_vr): New forwarder. > > (isinftf2): New dispatcher. > > * config/s390/vx-builtins.md (*vftci<mode>_cconly): Use VF_HW > > instead of VECF_HW, add missing constraint, add vw support. > > (vftci<mode>_intcconly): Use VF_HW instead of VECF_HW. > > (*vftci<mode>): Rename to vftci<mode>, use VF_HW instead of > > VECF_HW, and vw support. > > (vftci<mode>_intcc): Use VF_HW instead of VECF_HW. > > ... > > > +; VX: TFmode in VR: use wfcxb > > +(define_insn "*cmptf_ccs" > > + [(set (reg CC_REGNUM) > > + (compare (match_operand:TF 0 "register_operand" "v") > > + (match_operand:TF 1 "general_operand" "v")))] > > Is this really benefitial to allow general_operands here? Everything > except registers need to be reloaded anyway. To my experience it is > helpful to emit the extra moves as early as possible to let the other > optimizers work with them.
The rtxes recognized by this pattern are initially generated by the generic cbranch expander, which allows general_operands and thus doesn't immediately reload. If we don't allow general_operands here, rtxes generated by cbranch will be unrecognizable.