https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115551
Bug ID: 115551
Summary: [missed optimization] "c1 << (a + c2)" not optimized
into "(c1 << c2) << a"
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: burnus at gcc dot gnu.org
CC: pinskia at gcc dot gnu.org
Target Milestone: ---
Created attachment 58468
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58468&action=edit
patch to show how to get a nice output – but doesn't actually use it. Not to be
used..
"c1 << (a + c2)" not optimized into "(c1 << c2) << a"
Example:
int f(int ch) {
unsigned long mask1 = ((((1UL))) << (1 + 4 * ((1) - 1))) << (ch * 4);
unsigned long mask2 = ((((1UL))) << (1 + 4 * ((ch + 1) - 1)));
return mask1-mask2;
}
GCC converts this currently to:
mask1 = 2 << (ch * 4)
mask2 = 1 << (ch * 4 + 1)
* * *
Related to
https://lore.kernel.org/lkml/d7ef7a6158df4ba6687233b0e00d37796b069fb3.1718791090.git.u.kleine-koe...@baylibre.com/
Result:
* With the 2nd form the resulting binary gets ~25% smaller
* Saving nearly 500 bytes!
* * *
On ARM, the generated code for mask1 is:
lsls r0, r0, #2
movs r3, #2
lsl.w r0, r3, r0
and for mask2:
lsls r0, r0, #2
adds r0, #1 // additional 'adds' instruction
movs r3, #1
lsl.w r0, r3, r0