On Tue, 16 Apr 2019, Jakub Jelinek wrote: > Hi! > > In expand_mul_overflow, we emit (depending on operand ranges etc.) more than > one multiplication to optimize for the common case. One of those is where > we are guaranteed that both operands are either sign or zero extended from > half precision types to the full precision, we can then use much shorter > sequence. opN_small_p is a compile time predicate whether the corresponding > argument is known to be in that range. If yes, we just emit the simple > code, otherwise we emit a runtime conditional to check (for uns compare the > high half of bits against zero, for !uns against signbit of the low part > shifted arithmetically right). In both cases we use SUBREG_PROMOTED_VAR_P > on the SUBREG to propagate this information down to the rest of the > expansion code. > > Unfortunately, seems the RTL hoist pass hapilly hoists the subreg/s/v in > this case before the conditional branch, the r259649 change in > simplify-rtx.c optimizes more than in the past a ZERO_EXTENSION from > subreg/s/v into just a pseudo copy and later on yet another pass optimizes > the comparison away. > > The following patch fixes that by emitting slightly different code if > !opN_small_p - instead of adding a subreg/s/v on the original pseudo, it > adds a new pseudo, copies original pseudo to it only in the bb guarded with > the condition and makes subreg/s/v on that new pseudo. That is enough to > tell the hoisting that it shouldn't hoist it, if even that wouldn't work, > I'd say that SUBREG_PROMOTED_VAR_P would be pretty much useless. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
If I remember correctly and read the cfgexpand.c comment /* For a promoted variable, X will not be used directly but wrapped in a SUBREG with SUBREG_PROMOTED_VAR_P set, which means that the RTL land will assume that its upper bits can be inferred from its lower bits. Therefore, if X isn't initialized on every path from the entry, then we must do it manually in order to fulfill the above assumption. */ you run into a similar situation but no, I don't think your patch is really fixing this given SUBREG_PROMOTED_VAR_P is only true depending on a condition (and not a data-dependence). Note that likely SUBREG_PROMOTED_VAR_P wasn't designed to communicate zero-extend info (can't you use a REG_EQUIV note somehow?) but it has to be information that is valid everywhere in the function unless data dependences force its motion (thus a conditional doesn't do). Eric should know for sure though. (oh, and yes, I think SUBREG_PROMOTED_VAR_P is a dangerous thing, but so is VRP info on SSA names as we've learned...) Richard. > 2019-04-16 Jakub Jelinek <ja...@redhat.com> > > PR middle-end/90095 > * internal-fn.c (expand_mul_overflow): Don't set SUBREG_PROMOTED_VAR_P > on lowpart SUBREG of op0 if !op0_small_p, instead copy op0 into a new > pseudo and set SUBREG_PROMOTED_VAR_P on a lowpart SUBREG of that > pseudo. Similarly for op1. > > * gcc.dg/pr90095.c: New test. > > --- gcc/internal-fn.c.jj 2019-01-07 09:47:33.000000000 +0100 > +++ gcc/internal-fn.c 2019-04-15 17:41:50.915039293 +0200 > @@ -1756,13 +1756,35 @@ expand_mul_overflow (location_t loc, tre > rtx lopart0s = lopart0, lopart1s = lopart1; > if (GET_CODE (lopart0) == SUBREG) > { > - lopart0s = shallow_copy_rtx (lopart0); > + if (!op0_small_p) > + { > + /* If !op0_small_p, the SUBREG_PROMOTED_VAR_P state we > + want to set is just local to the containing basic block, > + guarded with the above conditional branch. Make sure > + to create a new pseudo, so that hoisting doesn't hoist > + the promoted subreg before the conditional and optimize > + away the conditional incorrectly. See PR90095 for > + details. */ > + rtx op0_copy = gen_reg_rtx (mode); > + emit_move_insn (op0_copy, op0); > + lopart0s = convert_modes (hmode, mode, op0_copy, uns); > + gcc_assert (GET_CODE (lopart0s) == SUBREG); > + } > + lopart0s = shallow_copy_rtx (lopart0s); > SUBREG_PROMOTED_VAR_P (lopart0s) = 1; > SUBREG_PROMOTED_SET (lopart0s, uns ? SRP_UNSIGNED : SRP_SIGNED); > } > if (GET_CODE (lopart1) == SUBREG) > { > - lopart1s = shallow_copy_rtx (lopart1); > + if (!op1_small_p) > + { > + /* See above PR90095 comment. */ > + rtx op1_copy = gen_reg_rtx (mode); > + emit_move_insn (op1_copy, op1); > + lopart1s = convert_modes (hmode, mode, op1_copy, uns); > + gcc_assert (GET_CODE (lopart1s) == SUBREG); > + } > + lopart1s = shallow_copy_rtx (lopart1s); > SUBREG_PROMOTED_VAR_P (lopart1s) = 1; > SUBREG_PROMOTED_SET (lopart1s, uns ? SRP_UNSIGNED : SRP_SIGNED); > } > --- gcc/testsuite/gcc.dg/pr90095.c.jj 2019-04-15 17:47:21.402673528 +0200 > +++ gcc/testsuite/gcc.dg/pr90095.c 2019-04-15 17:52:38.111531221 +0200 > @@ -0,0 +1,18 @@ > +/* PR middle-end/90095 */ > +/* { dg-do run } */ > +/* { dg-options "-Os -fno-tree-bit-ccp" } */ > + > +unsigned long long a; > +unsigned int b; > + > +int > +main () > +{ > + unsigned int c = 255, d = c |= b; > + if (__CHAR_BIT__ != 8 || __SIZEOF_INT__ != 4 || __SIZEOF_LONG_LONG__ != 8) > + return 0; > + d = __builtin_mul_overflow (-(unsigned long long) d, (unsigned char) - c, > &a); > + if (d != 0) > + __builtin_abort (); > + return 0; > +} > > Jakub > -- Richard Biener <rguent...@suse.de> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)