16 Regression] de-optimized synthesis of long long shift and add

avieira at gcc dot gnu.org via Gcc-bugs Thu, 27 Nov 2025 02:42:33 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122871


avieira at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |avieira at gcc dot gnu.org

--- Comment #4 from avieira at gcc dot gnu.org ---
I believe this started with
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=144aee70b80de50f96a97ee64edd2f1c237c4906
 

Which introduces a match.pd pattern that optimizes the sequence to a
multiplication. That's not necessarily 'the problem', but it's why things
changed in gcc-11.

I believe the 'problem' lies with synth_mult, which is used to determine the
'algorithm' to expand a multiplication by a constant.

During synth_mul it decies to split up the multiplication equivalent to 'a <<
33' into multiple multiplications because it does not consider doing a shift
with a constant larger than BITS_PER_WORD.

With the small hack at the bottom of this comment we get the optimal codegen
back. I'm not saying this is the right thing to do though, I am not at all
familiar with synth_mul and this happens to work for 'arm' because our backend
can codegen these 'larger' shifts, not sure that's true for all backends, I
expect that cap on BITS_PER_WORD was there for a reason. Also I've not tested
this any further than the testcase...


diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index be427dca5d9..e946f95a665 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -2800,7 +2800,7 @@ synth_mult (struct algorithm *alg_out, unsigned
HOST_WIDE_INT t,
   /* Be prepared for vector modes.  */
   imode = as_a <scalar_int_mode> (GET_MODE_INNER (mode));

-  maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (imode));
+  maxm = GET_MODE_BITSIZE (imode);

   /* Restrict the bits of "t" to the multiplication's mode.  */
   t &= GET_MODE_MASK (imode);
@@ -2915,8 +2915,8 @@ synth_mult (struct algorithm *alg_out, unsigned
HOST_WIDE_INT t,
             a sequence of additions, so the observed cost is given as
             MIN (m * add_cost(speed, mode), shift_cost(speed, mode, m)).  */
          op_cost = m * add_cost (speed, mode);
-         if (shift_cost (speed, mode, m) < op_cost)
-           op_cost = shift_cost (speed, mode, m);
+         if (shift_cost (speed, mode, m % MAX_BITS_PER_WORD) < op_cost)
+           op_cost = shift_cost (speed, mode, m % MAX_BITS_PER_WORD);
          new_limit.cost = best_cost.cost - op_cost;
          new_limit.latency = best_cost.latency - op_cost;
          synth_mult (alg_in, q, &new_limit, mode);

[Bug middle-end/122871] [13/14/15/16 Regression] de-optimized synthesis of long long shift and add

Reply via email to