https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110551
Bug ID: 110551 Summary: [11 / 12 / 13 /14 regression] Suboptimal codegen for 128 bits multiplication on x86_64 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: moncef.mechri at gmail dot com Target Milestone: --- https://godbolt.org/z/3hdondY6n Codegen for the code shared above (which is a mixing step in boost.Unordered when a non-avalanching hash function is being used [1] ) regressed since GCC 11. I believe there are 2 regressions: Regression 1: A redundant move is introduced: movabs rcx, -7046029254386353131 mov rax, rcx The regression seems to be present at all optimization levels above -O0 (including -Os and -Og). Possibly a duplicate of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94804 Regression 2 When using -march=haswell or newer, GCC >= 11 emits mulx. The resulting code is longer (by 1 instruction) with no clear benefit to my untrained eyes. It looks to me like the code generated by GCC 10 is optimal, even for haswell and newer. I am reporting both issues in the same bug report because they seem related enough. Let me know if you want me to split them into 2 bug reports instead. [1] https://github.com/boostorg/unordered/blob/9a7d1d336aaa73ad8e5f7c07bdb81b2e793f8d93/include/boost/unordered/detail/mulx.hpp#L111