https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024
Roger Sayle <roger at nextmovesoftware dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |roger at nextmovesoftware dot com --- Comment #9 from Roger Sayle <roger at nextmovesoftware dot com> --- Created attachment 60680 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60680&action=edit Standalone reduction of libgcc's __udivti3. The bugzilla title implies that the issue is with 128-bit division, which in this testcase is performed by libgcc's __udivti3. Indeed, in Colin's attachments we appear to be doing worse at argument passing/shuffling (as observed by Jakub). However, this appears to be fixed (or better) for me on mainline, and godbolt's gcc14 (see attached code). Confusingly, __udivti3 wouldn't be impacted by the callers use of -mavx, and indeed none of the attached code (caller and calleee) actually uses AVX/SSE instructions or registers, so perhaps Haochen's analysis is right that this is some strange DSB scheduling issue? I've not yet managed to reproduce the problem, so if someone could check whether linking the gcc-13 stress-cpu with the gcc-14 udivti3, and likewise the gcc-14 stress-cpu against the gcc-13 udivti3, we can narrow down which combination actually triggered the regression. Thanks in advance.