Hi Hao Chen,

[ You first need to add yourself to MAINTAINERS?  And get an account to
  do that, if you do not yet have one yet :-) ]

On Mon, Nov 09, 2020 at 10:48:19AM +0800, HAO CHEN GUI wrote:
> This patch adds a new pattern in 4-insn combine. It supports the 
> following sign_extend(op: zero_extend, zero_extend) optimization. In the 
> patch, newpat is split twice. The first split becomes newi1pat and the 
> second becomes newi2pat. They replace i1, i2 and i3 if all of them can 
> be recognized.
> 
> 7: r126:SI=zero_extend([r123:DI+0x1])
> 6: r125:SI=zero_extend([r123:DI])
> 8: r127:SI=r125:SI+r126:SI
> 9: r124:DI=sign_extend(r127:SI)
> 
> are replaced by:
> 
> 7: r125:DI=zero_extend([r123:DI])
> 8: r127:DI=zero_extend([r123:DI+0x1])
> 9: r124:DI=r127:DI+r125:DI

So in the original insn 8 registers 125 and 126 died, so this isn't
changing insn 7 at all, just moving it back from 6!  That is not what
combine should do, but it also is problematic: is it checked anywhere
that we *can* move the insn like this?

> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -851,10 +851,11 @@ do_SUBST_LINK (struct insn_link **into, struct 
> insn_link *newval)
>  
>  static bool
>  combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn 
> *i3,
> -                    rtx newpat, rtx newi2pat, rtx newotherpat)
> +                    rtx newpat, rtx newi2pat, rtx newotherpat,
> +                    rtx newi1pat)

(Keep the args ordered: 3, 2, 1, other.  "newpat" means "newi3pat").

> @@ -2672,7 +2693,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>            int *new_direct_jump_p, rtx_insn *last_combined_insn)
>  {
>    /* New patterns for I3 and I2, respectively.  */
> -  rtx newpat, newi2pat = 0;
> +  rtx newpat, newi2pat = 0, newi1pat = 0;
>    rtvec newpat_vec_with_clobbers = 0;
>    int substed_i2 = 0, substed_i1 = 0, substed_i0 = 0;
>    /* Indicates need to preserve SET in I0, I1 or I2 in I3 if it is not
> @@ -2682,8 +2703,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>    int total_sets;
>    /* Nonzero if I2's or I1's body now appears in I3.  */
>    int i2_is_used = 0, i1_is_used = 0;
> -  /* INSN_CODEs for new I3, new I2, and user of condition code.  */
> +  /* INSN_CODEs for new I3, new I2, new I1 and user of condition code.  */

(Comma after I1: "new I1, and ...".)

>    int insn_code_number, i2_code_number = 0, other_code_number = 0;
> +  int i1_code_number = 0;

Well, put it together with i3, i2 then, put "other" on its own line?

> @@ -2756,7 +2778,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>         else if (BINARY_P (src) && CONSTANT_P (XEXP (src, 1)))
>           ngood++;
>         else if (GET_CODE (src) == ASHIFT || GET_CODE (src) == ASHIFTRT
> -                || GET_CODE (src) == LSHIFTRT)
> +                || GET_CODE (src) == LSHIFTRT
> +                || GET_CODE (src) == SIGN_EXTEND
> +                || GET_CODE (src) == ZERO_EXTEND)
>           nshift++;

Extends are not shifts.  Please count "nextend" separately?

> @@ -3399,6 +3423,12 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>             i1src = subst (i1src, pc_rtx, pc_rtx, 0, 0, 0);
>           }
>  
> +       if (i0)
> +         {
> +           subst_low_luid = DF_INSN_LUID (i0);
> +           i0src = subst (i0src, pc_rtx, pc_rtx, 0, 0, 0);
> +         }

This won't work?  "subst_low_uid" is overwritten right in the next
statement:

>         subst_low_luid = DF_INSN_LUID (i2);
>         i2src = subst (i2src, pc_rtx, pc_rtx, 0, 0, 0);

> @@ -3920,6 +3950,50 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>             rtx src_op0 = XEXP (setsrc, 0);
>             rtx src_op1 = XEXP (setsrc, 1);
>  
> +           /* Double split when src of i0 and i1 are both ZERO_EXTEND.  */
> +           if (i0 && i1
> +               && GET_CODE (PATTERN (i0)) == SET
> +               && GET_CODE (PATTERN (i1)) == SET
> +               && GET_CODE (SET_SRC (PATTERN (i0))) == ZERO_EXTEND
> +               && GET_CODE (SET_SRC (PATTERN (i1))) == ZERO_EXTEND
> +               && (rtx_equal_p (XEXP (*split, 0),
> +                                XEXP (SET_SRC (PATTERN (i1)), 0))
> +                   || rtx_equal_p (XEXP (*split, 0),
> +                                   XEXP (SET_SRC (PATTERN (i0)), 0))))
> +             {
> +               newi1pat = NULL_RTX;
> +               rtx newdest, *i0_i1dest;
> +               machine_mode new_mode;
> +
> +               new_mode = GET_MODE (*split);
> +               if (rtx_equal_p (XEXP (*split, 0),
> +                                XEXP (SET_SRC (PATTERN (i1)), 0)))
> +                 i0_i1dest = &i1dest;
> +               else
> +                 i0_i1dest = &i0dest;
> +
> +               if (REGNO (i1dest) < FIRST_PSEUDO_REGISTER)
> +                 newdest = gen_rtx_REG (new_mode, REGNO (*i0_i1dest));

You can only have one mode that you use pseudos in in the whole routine.
This doesn't work, i0_i1dest can be used elsewhere still.

> +               else
> +                 {
> +                   SUBST_MODE (regno_reg_rtx[REGNO (*i0_i1dest)], new_mode);
> +                   newdest = regno_reg_rtx[REGNO (*i0_i1dest)];
> +                 }

Messing with modes in general is a big problem for combine (it still
has some open problems for this, too).

> +               newi1pat =  gen_rtx_SET (newdest, *split);
> +               SUBST (*split, newdest);
> +
> +               i1_code_number = recog_for_combine (&newi1pat, i2,
> +                                                     &new_i2_notes);
> +               if (i1_code_number < 0)
> +                 {
> +                   undo_all ();
> +                   return 0;
> +                 }
> +
> +               split = find_split_point (&newpat, i3, false);
> +             }

I doubt that you can call find_split_point twice like this...  It likely
still needs to check that the insns it ends up with for newi1 and newi2
can go, in that order, where i1 and i2 used to be.  For i2 and i3 this
is much simpler, but now we will need a lot of extra work to do these
checks.

> +         else if (reg == i0dest)
> +           {
> +             propagate_for_debug (i0, last_combined_insn, reg, i0src,
> +                                  this_basic_block);
> +             adjust_reg_mode (reg, new_mode);
> +           }

How can i0dest ever end up in the final insns?  Ouch.

I'm running an all-arch comparison with this patch, just to see what it
does, but please make it so that for

7: r126:SI=zero_extend([r123:DI+0x1])
6: r125:SI=zero_extend([r123:DI])
8: r127:SI=r125:SI+r126:SI
9: r124:DI=sign_extend(r127:SI)

8+9 is combined to just

9: r124:DI=r125:DI+r126:DI

and that's all?  (The "compact" format leaves out crucial information,
but the loads here are QImode loads.)

I don't think 4->3 combines will do very much that cannot be done by
some combination of 2->1, 3->1, 4->1, 2->2, 3->2, 4->2, but we'll see.
It will be quite expensive to do in any case (combine isn't one of the
expensive passes anymore, but let's try to keep it that way ;-) )


Segher

Reply via email to