14 regression] Suboptimal codegen for 128 bits multiplication on x86_64

moncef.mechri at gmail dot com via Gcc-bugs Tue, 04 Jul 2023 10:26:19 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551


            Bug ID: 110551
           Summary: [11 / 12 / 13 /14 regression] Suboptimal codegen for
                    128 bits multiplication on x86_64
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: moncef.mechri at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/3hdondY6n

Codegen for the code shared above (which is a mixing step in boost.Unordered
when a non-avalanching hash function is being used [1] ) regressed since GCC
11. I believe there are 2 regressions:

Regression 1:

A redundant move is introduced:


        movabs  rcx, -7046029254386353131
        mov     rax, rcx


The regression seems to be present at all optimization levels above -O0
(including -Os and -Og).

Possibly a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804


Regression 2

When using -march=haswell or newer, GCC >= 11 emits mulx. The resulting code is
longer (by 1 instruction) with no clear benefit to my untrained eyes. It looks
to me like the code generated by GCC 10 is optimal, even for haswell and newer.


I am reporting both issues in the same bug report because they seem related
enough. Let me know if you want me to split them into 2 bug reports instead.

[1]
https://github.com/boostorg/unordered/blob/9a7d1d336aaa73ad8e5f7c07bdb81b2e793f8d93/include/boost/unordered/detail/mulx.hpp#L111

[Bug rtl-optimization/110551] New: [11 / 12 / 13 /14 regression] Suboptimal codegen for 128 bits multiplication on x86_64

Reply via email to