https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70461
Alexander Fomin <afomin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #38134|0 |1
is obsolete| |
--- Comment #5 from Alexander Fomin <afomin at gcc dot gnu.org> ---
Created attachment 38184
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38184&action=edit
Another reproducer
Thanks, performance is back on Core CPUs.
However, I've noticed that given a slightly different testcase compiled with
-m32 -O2 we also generate extra insns for the loop (the degradation can be seen
on some other CPUs, e.g. when specifying -march=slm).
What I see in RTL ira dump is (with some identical lines removed):
+---------------------------------------------------------------+
| Before r234527 | After r234527 |
---------------------------------------+-------------------------
| Assigning 0 to a26r113 | Assigning 4 to a14r144 |
| Assigning 0 to a27r181 | Assigning 4 to a42r113 |
| Spilling a29r178 for a28r180 | Assigning 4 to a46r137 |
| Assigning 0 to a28r180 | Assigning 4 to a50r128 |
| Assigning 0 to a30r137 | Assigning 4 to a54r121 |
| Assigning 0 to a31r177 | Assigning 4 to a26r113 |
| Spilling a33r174 for a32r176 | Assigning 4 to a30r137 |
| Assigning 0 to a32r176 | Assigning 4 to a34r128 |
| Assigning 0 to a34r128 | Assigning 4 to a38r121 |
| Assigning 0 to a35r173 | |
| Spilling a37r170 for a36r172 | |
| Assigning 0 to a36r172 | |
| Assigning 0 to a38r121 | |
| Assigning 0 to a39r169 | |
| Spilling a41r166 for a40r168 | |
| Assigning 0 to a40r168 | |
| a41(r166,l1) -- (...) assign memory | |
| a29(r178,l1) -- (...) assign memory | |
| a33(r174,l1) -- (...) assign memory | |
| a37(r170,l1) -- (...) assign memory | |
+--------------------------------------+------------------------+
Looks like we don't consider spilling and memory more profitable anymore...
Could you please take a look?