https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122871
avieira at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |avieira at gcc dot gnu.org
--- Comment #4 from avieira at gcc dot gnu.org ---
I believe this started with
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=144aee70b80de50f96a97ee64edd2f1c237c4906
Which introduces a match.pd pattern that optimizes the sequence to a
multiplication. That's not necessarily 'the problem', but it's why things
changed in gcc-11.
I believe the 'problem' lies with synth_mult, which is used to determine the
'algorithm' to expand a multiplication by a constant.
During synth_mul it decies to split up the multiplication equivalent to 'a <<
33' into multiple multiplications because it does not consider doing a shift
with a constant larger than BITS_PER_WORD.
With the small hack at the bottom of this comment we get the optimal codegen
back. I'm not saying this is the right thing to do though, I am not at all
familiar with synth_mul and this happens to work for 'arm' because our backend
can codegen these 'larger' shifts, not sure that's true for all backends, I
expect that cap on BITS_PER_WORD was there for a reason. Also I've not tested
this any further than the testcase...
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index be427dca5d9..e946f95a665 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2800,7 +2800,7 @@ synth_mult (struct algorithm *alg_out, unsigned
HOST_WIDE_INT t,
/* Be prepared for vector modes. */
imode = as_a <scalar_int_mode> (GET_MODE_INNER (mode));
- maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (imode));
+ maxm = GET_MODE_BITSIZE (imode);
/* Restrict the bits of "t" to the multiplication's mode. */
t &= GET_MODE_MASK (imode);
@@ -2915,8 +2915,8 @@ synth_mult (struct algorithm *alg_out, unsigned
HOST_WIDE_INT t,
a sequence of additions, so the observed cost is given as
MIN (m * add_cost(speed, mode), shift_cost(speed, mode, m)). */
op_cost = m * add_cost (speed, mode);
- if (shift_cost (speed, mode, m) < op_cost)
- op_cost = shift_cost (speed, mode, m);
+ if (shift_cost (speed, mode, m % MAX_BITS_PER_WORD) < op_cost)
+ op_cost = shift_cost (speed, mode, m % MAX_BITS_PER_WORD);
new_limit.cost = best_cost.cost - op_cost;
new_limit.latency = best_cost.latency - op_cost;
synth_mult (alg_in, q, &new_limit, mode);