On Tue, 16 Apr 2019, Jakub Jelinek wrote:
> Hi!
>
> In expand_mul_overflow, we emit (depending on operand ranges etc.) more than
> one multiplication to optimize for the common case. One of those is where
> we are guaranteed that both operands are either sign or zero extended from
> half precision types to the full precision, we can then use much shorter
> sequence. opN_small_p is a compile time predicate whether the corresponding
> argument is known to be in that range. If yes, we just emit the simple
> code, otherwise we emit a runtime conditional to check (for uns compare the
> high half of bits against zero, for !uns against signbit of the low part
> shifted arithmetically right). In both cases we use SUBREG_PROMOTED_VAR_P
> on the SUBREG to propagate this information down to the rest of the
> expansion code.
>
> Unfortunately, seems the RTL hoist pass hapilly hoists the subreg/s/v in
> this case before the conditional branch, the r259649 change in
> simplify-rtx.c optimizes more than in the past a ZERO_EXTENSION from
> subreg/s/v into just a pseudo copy and later on yet another pass optimizes
> the comparison away.
>
> The following patch fixes that by emitting slightly different code if
> !opN_small_p - instead of adding a subreg/s/v on the original pseudo, it
> adds a new pseudo, copies original pseudo to it only in the bb guarded with
> the condition and makes subreg/s/v on that new pseudo. That is enough to
> tell the hoisting that it shouldn't hoist it, if even that wouldn't work,
> I'd say that SUBREG_PROMOTED_VAR_P would be pretty much useless.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
If I remember correctly and read the cfgexpand.c comment
/* For a promoted variable, X will not be used directly but wrapped in a
SUBREG with SUBREG_PROMOTED_VAR_P set, which means that the RTL land
will assume that its upper bits can be inferred from its lower bits.
Therefore, if X isn't initialized on every path from the entry, then
we must do it manually in order to fulfill the above assumption. */
you run into a similar situation but no, I don't think your patch is
really fixing this given SUBREG_PROMOTED_VAR_P is only true depending
on a condition (and not a data-dependence).
Note that likely SUBREG_PROMOTED_VAR_P wasn't designed to communicate
zero-extend info (can't you use a REG_EQUIV note somehow?) but it has
to be information that is valid everywhere in the function unless
data dependences force its motion (thus a conditional doesn't do).
Eric should know for sure though.
(oh, and yes, I think SUBREG_PROMOTED_VAR_P is a dangerous thing,
but so is VRP info on SSA names as we've learned...)
Richard.
> 2019-04-16 Jakub Jelinek <[email protected]>
>
> PR middle-end/90095
> * internal-fn.c (expand_mul_overflow): Don't set SUBREG_PROMOTED_VAR_P
> on lowpart SUBREG of op0 if !op0_small_p, instead copy op0 into a new
> pseudo and set SUBREG_PROMOTED_VAR_P on a lowpart SUBREG of that
> pseudo. Similarly for op1.
>
> * gcc.dg/pr90095.c: New test.
>
> --- gcc/internal-fn.c.jj 2019-01-07 09:47:33.000000000 +0100
> +++ gcc/internal-fn.c 2019-04-15 17:41:50.915039293 +0200
> @@ -1756,13 +1756,35 @@ expand_mul_overflow (location_t loc, tre
> rtx lopart0s = lopart0, lopart1s = lopart1;
> if (GET_CODE (lopart0) == SUBREG)
> {
> - lopart0s = shallow_copy_rtx (lopart0);
> + if (!op0_small_p)
> + {
> + /* If !op0_small_p, the SUBREG_PROMOTED_VAR_P state we
> + want to set is just local to the containing basic block,
> + guarded with the above conditional branch. Make sure
> + to create a new pseudo, so that hoisting doesn't hoist
> + the promoted subreg before the conditional and optimize
> + away the conditional incorrectly. See PR90095 for
> + details. */
> + rtx op0_copy = gen_reg_rtx (mode);
> + emit_move_insn (op0_copy, op0);
> + lopart0s = convert_modes (hmode, mode, op0_copy, uns);
> + gcc_assert (GET_CODE (lopart0s) == SUBREG);
> + }
> + lopart0s = shallow_copy_rtx (lopart0s);
> SUBREG_PROMOTED_VAR_P (lopart0s) = 1;
> SUBREG_PROMOTED_SET (lopart0s, uns ? SRP_UNSIGNED : SRP_SIGNED);
> }
> if (GET_CODE (lopart1) == SUBREG)
> {
> - lopart1s = shallow_copy_rtx (lopart1);
> + if (!op1_small_p)
> + {
> + /* See above PR90095 comment. */
> + rtx op1_copy = gen_reg_rtx (mode);
> + emit_move_insn (op1_copy, op1);
> + lopart1s = convert_modes (hmode, mode, op1_copy, uns);
> + gcc_assert (GET_CODE (lopart1s) == SUBREG);
> + }
> + lopart1s = shallow_copy_rtx (lopart1s);
> SUBREG_PROMOTED_VAR_P (lopart1s) = 1;
> SUBREG_PROMOTED_SET (lopart1s, uns ? SRP_UNSIGNED : SRP_SIGNED);
> }
> --- gcc/testsuite/gcc.dg/pr90095.c.jj 2019-04-15 17:47:21.402673528 +0200
> +++ gcc/testsuite/gcc.dg/pr90095.c 2019-04-15 17:52:38.111531221 +0200
> @@ -0,0 +1,18 @@
> +/* PR middle-end/90095 */
> +/* { dg-do run } */
> +/* { dg-options "-Os -fno-tree-bit-ccp" } */
> +
> +unsigned long long a;
> +unsigned int b;
> +
> +int
> +main ()
> +{
> + unsigned int c = 255, d = c |= b;
> + if (__CHAR_BIT__ != 8 || __SIZEOF_INT__ != 4 || __SIZEOF_LONG_LONG__ != 8)
> + return 0;
> + d = __builtin_mul_overflow (-(unsigned long long) d, (unsigned char) - c,
> &a);
> + if (d != 0)
> + __builtin_abort ();
> + return 0;
> +}
>
> Jakub
>
--
Richard Biener <[email protected]>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)