On Tue, 16 Apr 2019, Jakub Jelinek wrote:

> Hi!
> 
> In expand_mul_overflow, we emit (depending on operand ranges etc.) more than
> one multiplication to optimize for the common case.  One of those is where
> we are guaranteed that both operands are either sign or zero extended from
> half precision types to the full precision, we can then use much shorter
> sequence.  opN_small_p is a compile time predicate whether the corresponding
> argument is known to be in that range.  If yes, we just emit the simple
> code, otherwise we emit a runtime conditional to check (for uns compare the
> high half of bits against zero, for !uns against signbit of the low part
> shifted arithmetically right).  In both cases we use SUBREG_PROMOTED_VAR_P
> on the SUBREG to propagate this information down to the rest of the
> expansion code.
> 
> Unfortunately, seems the RTL hoist pass hapilly hoists the subreg/s/v in
> this case before the conditional branch, the r259649 change in
> simplify-rtx.c optimizes more than in the past a ZERO_EXTENSION from
> subreg/s/v into just a pseudo copy and later on yet another pass optimizes
> the comparison away.
> 
> The following patch fixes that by emitting slightly different code if
> !opN_small_p - instead of adding a subreg/s/v on the original pseudo, it
> adds a new pseudo, copies original pseudo to it only in the bb guarded with
> the condition and makes subreg/s/v on that new pseudo.  That is enough to
> tell the hoisting that it shouldn't hoist it, if even that wouldn't work,
> I'd say that SUBREG_PROMOTED_VAR_P would be pretty much useless.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

If I remember correctly and read the cfgexpand.c comment

  /* For a promoted variable, X will not be used directly but wrapped in a
     SUBREG with SUBREG_PROMOTED_VAR_P set, which means that the RTL land
     will assume that its upper bits can be inferred from its lower bits.
     Therefore, if X isn't initialized on every path from the entry, then
     we must do it manually in order to fulfill the above assumption.  */

you run into a similar situation but no, I don't think your patch is
really fixing this given SUBREG_PROMOTED_VAR_P is only true depending
on a condition (and not a data-dependence).

Note that likely SUBREG_PROMOTED_VAR_P wasn't designed to communicate
zero-extend info (can't you use a REG_EQUIV note somehow?) but it has
to be information that is valid everywhere in the function unless
data dependences force its motion (thus a conditional doesn't do).

Eric should know for sure though.

(oh, and yes, I think SUBREG_PROMOTED_VAR_P is a dangerous thing,
but so is VRP info on SSA names as we've learned...)

Richard.

> 2019-04-16  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR middle-end/90095
>       * internal-fn.c (expand_mul_overflow): Don't set SUBREG_PROMOTED_VAR_P
>       on lowpart SUBREG of op0 if !op0_small_p, instead copy op0 into a new
>       pseudo and set SUBREG_PROMOTED_VAR_P on a lowpart SUBREG of that
>       pseudo.  Similarly for op1.
> 
>       * gcc.dg/pr90095.c: New test.
> 
> --- gcc/internal-fn.c.jj      2019-01-07 09:47:33.000000000 +0100
> +++ gcc/internal-fn.c 2019-04-15 17:41:50.915039293 +0200
> @@ -1756,13 +1756,35 @@ expand_mul_overflow (location_t loc, tre
>         rtx lopart0s = lopart0, lopart1s = lopart1;
>         if (GET_CODE (lopart0) == SUBREG)
>           {
> -           lopart0s = shallow_copy_rtx (lopart0);
> +           if (!op0_small_p)
> +             {
> +               /* If !op0_small_p, the SUBREG_PROMOTED_VAR_P state we
> +                  want to set is just local to the containing basic block,
> +                  guarded with the above conditional branch.  Make sure
> +                  to create a new pseudo, so that hoisting doesn't hoist
> +                  the promoted subreg before the conditional and optimize
> +                  away the conditional incorrectly.  See PR90095 for
> +                  details.  */
> +               rtx op0_copy = gen_reg_rtx (mode);
> +               emit_move_insn (op0_copy, op0);
> +               lopart0s = convert_modes (hmode, mode, op0_copy, uns);
> +               gcc_assert (GET_CODE (lopart0s) == SUBREG);
> +             }
> +           lopart0s = shallow_copy_rtx (lopart0s);
>             SUBREG_PROMOTED_VAR_P (lopart0s) = 1;
>             SUBREG_PROMOTED_SET (lopart0s, uns ? SRP_UNSIGNED : SRP_SIGNED);
>           }
>         if (GET_CODE (lopart1) == SUBREG)
>           {
> -           lopart1s = shallow_copy_rtx (lopart1);
> +           if (!op1_small_p)
> +             {
> +               /* See above PR90095 comment.  */
> +               rtx op1_copy = gen_reg_rtx (mode);
> +               emit_move_insn (op1_copy, op1);
> +               lopart1s = convert_modes (hmode, mode, op1_copy, uns);
> +               gcc_assert (GET_CODE (lopart1s) == SUBREG);
> +             }
> +           lopart1s = shallow_copy_rtx (lopart1s);
>             SUBREG_PROMOTED_VAR_P (lopart1s) = 1;
>             SUBREG_PROMOTED_SET (lopart1s, uns ? SRP_UNSIGNED : SRP_SIGNED);
>           }
> --- gcc/testsuite/gcc.dg/pr90095.c.jj 2019-04-15 17:47:21.402673528 +0200
> +++ gcc/testsuite/gcc.dg/pr90095.c    2019-04-15 17:52:38.111531221 +0200
> @@ -0,0 +1,18 @@
> +/* PR middle-end/90095 */
> +/* { dg-do run } */
> +/* { dg-options "-Os -fno-tree-bit-ccp" } */
> +
> +unsigned long long a;
> +unsigned int b;
> +
> +int
> +main ()
> +{
> +  unsigned int c = 255, d = c |= b;
> +  if (__CHAR_BIT__ != 8 || __SIZEOF_INT__ != 4 || __SIZEOF_LONG_LONG__ != 8)
> +    return 0;
> +  d = __builtin_mul_overflow (-(unsigned long long) d, (unsigned char) - c, 
> &a);
> +  if (d != 0)
> +    __builtin_abort ();
> +  return 0;
> +}
> 
>       Jakub
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Reply via email to