https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106220
Bug ID: 106220 Summary: x86-64 optimizer forgets about shrd peephole optimization pattern when faced with more than one in close proximity Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: already5chosen at yahoo dot com Target Milestone: --- I am reporting about right shift issue, but left shift has the same issues as well. In theory, gcc knows how to calculate lower 64 bits of the right shift of 128-bit number with a single instruction when it is provable that shift count is in range [0:63]. In practice, it does it only under very special condition. See here: https://godbolt.org/z/fhdo8xhxW foo1to1 is good foo2to1 is good foo1to2 starts well but is broken near the end but hyperactive vectorizer. But that's a separate issue already reported in 105617. foo2to2, foo2to3, foo3to4 - looks like compiler forgot all it knew about double-word right shifts, or, more likely, forgot that (x % 64) is always in range [0:63]. I am reporting it as a target issue despite being sure that the problem is not in the x86-64 back end itself, but somehow in interaction between various phases of optimizer. As 80+ percents of my reports. However it's your call, not mine. In practice, an impact is most visible on x86-64, because, due to existence of shrd instruction, x86-64 is potentially very good in this sort of tasks. On ARM64 or on POWER64LE the relative slowdown is lower, because an optimal code is not as fast. P.S 82261 sounds similar, but I am not sure it is related.