https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68404

--- Comment #31 from Michael Meissner <meissner at linux dot vnet.ibm.com> ---
On Thu, Feb 11, 2016 at 02:32:13AM +0000, bernds at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68404
> 
> --- Comment #30 from Bernd Schmidt <bernds at gcc dot gnu.org> ---
> Something like this maybe? I don't know much about the machine and can't say
> whether the earlyclobber is justified, but looking at my dumps this appears to
> prevent the problematic peephole from triggering. No testing beyond that.
> 
> Index: rs6000.c
> ===================================================================
> --- rs6000.c    (revision 233217)
> +++ rs6000.c    (working copy)
> @@ -35801,6 +35801,9 @@ fusion_gpr_load_p (rtx addis_reg,       /* reg
>    if (!fusion_gpr_addis (addis_value, GET_MODE (addis_value)))
>      return false;
> 
> +  if (reg_overlap_mentioned_p (target, addis_value))
> +    return false;
> +
>    /* Allow sign/zero extension.  */
>    if (GET_CODE (mem) == ZERO_EXTEND
>        || (GET_CODE (mem) == SIGN_EXTEND && TARGET_P8_FUSION_SIGN))

In looking at it, I probably don't need the early clobber.  Let me play with
it.

To be clear, power8 has two types of fusion.  The problematical one is GPR load
fusion.  Normally addresses to large arrays and to TOC entries are two
instructions:

        (set (reg1)
             (high ...))
        (set (reg2)
             (mem (lo_sum (reg1)
                          (...))))

or

        (set (reg1)
             (plus (reg3)
                   (const_int 65536)))
        (set (reg2)
             (mem (plus (reg1)
                        (const_int 10))))

In order to use fusion, the first SET must be adjacent to the second SET, and
the register being loaded with the upper bits (reg1) must be identical to the
register being loaded from memory (reg2).  This requirement means that
reg1/reg2 must not be R0 (in the PowerPC memory addressing, a base register of
R0 means no register used, so we can't do the optimization if R0 is the target.

Now, in the future, we eventually hope to push this earlier before reload, but
during development, I was stymied that reload did not like asymetric addressing
(i.e. loads with fusion to GPR registers could have the extended addressing
form, but stores could not).  So I used a peephole2 to get what advantages that
I could from the existing insns.

Power9 fusion is more general, in that the register to get the high address
bits does not have to be the same as the register being loaded.  In addition,
you can now fuse GPR stores and load/store to the floating point/vector
registers.

In theory we could use the new fusion scheduling feature to do fusion, but that
is a GCC 7.x feature, not a fix for stage4 of GCC 6.x.

Reply via email to