At the moment, fwprop will propagate constants and registers even if no further rtl simplifications are possible:
if (REG_P (new_rtx) || CONSTANT_P (new_rtx)) flags |= PR_CAN_APPEAR; What do you think about extending this to subregs? The reason for asking is that on NEON, vector loads like vld4 are represented as a load of a single monolithic register followed by subreg extractions of each vector: (set (reg:OI FULL) (...)) (set (reg:V2SI V0) (subreg:V2SI (reg:OI FULL) 0)) (set (reg:V2SI V1) (subreg:V2SI (reg:OI FULL) 16)) (set (reg:V2SI V2) (subreg:V2SI (reg:OI FULL) 32)) (set (reg:V2SI V3) (subreg:V2SI (reg:OI FULL) 48)) Nothing ever propagates these subregs, so the separate moves survive until IRA. This has three problems: - We generally want the registers allocated to V0...V3 to be the same as FULL, so that the four subreg moves become nops. And this often happens in simple examples. But if register pressure is relatively high, these moves can sometimes cause IRA to spill in cases where it doesn't if the subregs are used instead of each Vi. - Perhaps related, register pressure becomes harder to estimate. - These moves can interfere with pre-reload scheduling. In combination with the MODES_TIEABLE_P patch that I posted here: http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00626.html this patch significantly improves the code generated for several libav loops. Unfortunately, I don't have a setup that can do meaningful x86_64 performance measurements, but a diff of the before and after output for libav showed many cases where the patch removed moves. What do you think? Alternatives include propagating in lower-subreg, or maybe only in the second fwprop pass. Richard gcc/ * fwprop.c (propagate_rtx): Also set PR_CAN_APPEAR for subregs. Index: gcc/fwprop.c =================================================================== --- gcc/fwprop.c 2011-08-26 09:58:28.829540497 +0100 +++ gcc/fwprop.c 2011-08-26 10:14:03.767707504 +0100 @@ -664,7 +664,7 @@ propagate_rtx (rtx x, enum machine_mode return NULL_RTX; flags = 0; - if (REG_P (new_rtx) || CONSTANT_P (new_rtx)) + if (REG_P (new_rtx) || CONSTANT_P (new_rtx) || GET_CODE (new_rtx) == SUBREG) flags |= PR_CAN_APPEAR; if (!for_each_rtx (&new_rtx, varying_mem_p, NULL)) flags |= PR_HANDLE_MEM;