Hi all, On s390 we only tie floating-point modes SF and DF, and then all remaining ones:
static bool s390_modes_tieable_p (machine_mode mode1, machine_mode mode2) { return ((mode1 == SFmode || mode1 == DFmode) == (mode2 == SFmode || mode2 == DFmode)); } This in turn leads sometimes to higher costs as e.g. in (set (reg:SI 60 [ _2 ]) (lshiftrt:SI (subreg:SI (reg/v:SF 62 [ xD.3041 ]) 0) (const_int 31 [0x1f]))) original cost = 4 (weighted: 4.000000), replacement cost = 16 (weighted: 16.000000); rejecting replacement due to rtx_costs() case SUBREG: total = 0; /* If we can't tie these modes, make this expensive. The larger the mode, the more expensive it is. */ if (!targetm.modes_tieable_p (mode, GET_MODE (SUBREG_REG (x)))) return COSTS_N_INSNS (2 + factor); >From the internals handbook I see the strong connection between TARGET_MODES_TIEABLE_P and TARGET_HARD_REGNO_MODE_OK which I kinda read as forall m1, m2. forall r. (TARGET_HARD_REGNO_MODE_OK (r, m1) == TARGET_HARD_REGNO_MODE_OK (r, m2)) => TARGET_MODES_TIEABLE_P (m1, m2) Since on s390 a value which is stored in any GPR/FPR/VR can be retrieved unmodified, i.e., there is no normalization happening in FPRs/VRs or whatsoever, we could actually lift this to something along the line: static bool s390_modes_tieable_p (machine_mode mode1, machine_mode mode2) { if (GET_MODE_CLASS (mode1) == MODE_CC) return GET_MODE_CLASS (mode2) == MODE_CC; if (TARGET_VX && GET_MODE_SIZE (mode1) <= 8 && GET_MODE_SIZE (mode2) <= 8) return true; return ((mode1 == SFmode || mode1 == DFmode) == (mode2 == SFmode || mode2 == DFmode)); } since any value up to 8 bytes can be stored/retrieved in any GPR/FPR/VR, if vector extensions are available. However, what makes me a little bit cautious is that this is true for a lot of targets but different TARGET_MODES_TIEABLE_P implementations are rather involved which makes me wonder whether I missed something here. Especially since riscv even does not allow modes SF and DF to be tied in contrast to s390: Don't allow floating-point modes to be tied, since type punning of single-precision and double-precision is implementation defined. Furthermore, the last sentence in the paragraph: If TARGET_HARD_REGNO_MODE_OK (r, mode1) and TARGET_HARD_REGNO_MODE_OK (r, mode2) are always the same for any r, then TARGET_MODES_TIEABLE_P (mode1, mode2) should be true. If they differ for any r, you should define this hook to return false unless some other mechanism ensures the accessibility of the value in a narrower mode. speaks about narrower mode which rather increases my concerns about the potential new implementation of s390_modes_tieable_p(). Last but not least, having a look at TARGET_MODES_TIEABLE_P usages we have e.g. in combine.cc: /* In general, don't install a subreg involving two modes not tieable. It can worsen register allocation, and can even make invalid reload insns, since the reg inside may need to be copied from in the outside mode, and that may be invalid if it is an fp reg copied in integer mode. Since on s390 FPRs are left aligned and GPRs are right aligned, I can imagine that a (subreg:DI (reg:SF 65 [ xD.3041 ]) 0) could lead to problems as described in the comment. During my experiments I have seen such subregs, though, those were all dealt with correctly but maybe I was just lucky. Long story short, could someone shed some light on hook TARGET_MODES_TIEABLE_P, or does anyone see a potential problem by tying all modes with size less than or equal to 8 bytes as long as we have proper mov<mode> insns? Cheers, Stefan