https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108614
Bug ID: 108614 Summary: _subborrow_u32 generates suboptimal code when second subtraction operand is constant on x86 targets Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: john_platts at hotmail dot com Target Milestone: --- Here is some C++ code that generates suboptimal code with the -O2 -march=skylake-avx512 -m32 options with gcc 12.2.0: #include <stdint.h> #include <utility> #include <x86intrin.h> #include <immintrin.h> std::pair<uint32_t, uint32_t> ComputeHiMaskAndHiZeroAmt(uint32_t len) { uint32_t hiMask; uint32_t hiZeroAmt; _addcarry_u32(_subborrow_u32(0, len, 32, &hiZeroAmt), uint32_t{0xFFFFFFFFu}, 0, &hiMask); hiMask = _bzhi_u32(hiMask, hiZeroAmt); return std::make_pair(hiMask, hiZeroAmt); } Here is the assembly code that is generated when the above code is compiled with gcc 12.2.0 with the -O2 -march=skylake-avx512 -m32 options: _Z25ComputeHiMaskAndHiZeroAmtj: subl $16, %esp movl 24(%esp), %eax movl $32, %edx subl %edx, %eax movl 20(%esp), %ecx movl $-1, %edx adcl $0, %edx movl %eax, 4(%ecx) bzhi %eax, %edx, %edx movl %ecx, %eax movl %edx, (%ecx) addl $16, %esp ret $4 Here is a more optimal version of the above code (for 32-bit x86): _Z25ComputeHiMaskAndHiZeroAmtj: movl 8(%esp), %eax subl $32, %eax movl 4(%esp), %ecx movl $-1, %edx adcl $0, %edx movl %eax, 4(%ecx) bzhi %eax, %edx, %edx movl %ecx, %eax movl %edx, (%ecx) ret $4 Here is the assembly code that is generated when the above code is compiled with gcc 12.2.0 with the -O2 -march=skylake-avx512 options: _Z25ComputeHiMaskAndHiZeroAmtj: movl $32, %eax subl %eax, %edi movl $-1, %eax adcl $0, %eax bzhi %edi, %eax, %eax salq $32, %rdi movl %eax, %eax orq %rdi, %rax ret Here is a more optimal version of the above code (for 64-bit x86): _Z25ComputeHiMaskAndHiZeroAmtj: subl $32, %edi movl $-1, %eax adcl $0, %eax bzhi %edi, %eax, %eax salq $32, %rdi movl %eax, %eax orq %rdi, %rax ret