[Bug target/108614] New: _subborrow_u32 generates suboptimal code when second subtraction operand is constant on x86 targets

john_platts at hotmail dot com via Gcc-bugs Tue, 31 Jan 2023 04:39:48 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108614


            Bug ID: 108614
           Summary: _subborrow_u32 generates suboptimal code when second
                    subtraction operand is constant on x86 targets
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: john_platts at hotmail dot com
  Target Milestone: ---

Here is some C++ code that generates suboptimal code with the -O2
-march=skylake-avx512 -m32 options with gcc 12.2.0:
#include <stdint.h>
#include <utility>
#include <x86intrin.h>
#include <immintrin.h>

std::pair<uint32_t, uint32_t> ComputeHiMaskAndHiZeroAmt(uint32_t len) {
    uint32_t hiMask;
    uint32_t hiZeroAmt;

    _addcarry_u32(_subborrow_u32(0, len, 32, &hiZeroAmt),
        uint32_t{0xFFFFFFFFu}, 0, &hiMask);

    hiMask = _bzhi_u32(hiMask, hiZeroAmt);

    return std::make_pair(hiMask, hiZeroAmt);
}

Here is the assembly code that is generated when the above code is compiled
with gcc 12.2.0 with the -O2 -march=skylake-avx512 -m32 options:
_Z25ComputeHiMaskAndHiZeroAmtj:
        subl    $16, %esp
        movl    24(%esp), %eax
        movl    $32, %edx
        subl    %edx, %eax
        movl    20(%esp), %ecx
        movl    $-1, %edx
        adcl    $0, %edx
        movl    %eax, 4(%ecx)
        bzhi    %eax, %edx, %edx
        movl    %ecx, %eax
        movl    %edx, (%ecx)
        addl    $16, %esp
        ret     $4

Here is a more optimal version of the above code (for 32-bit x86):
_Z25ComputeHiMaskAndHiZeroAmtj:
        movl    8(%esp), %eax
        subl    $32, %eax
        movl    4(%esp), %ecx
        movl    $-1, %edx
        adcl    $0, %edx
        movl    %eax, 4(%ecx)
        bzhi    %eax, %edx, %edx
        movl    %ecx, %eax
        movl    %edx, (%ecx)
        ret     $4

Here is the assembly code that is generated when the above code is compiled
with gcc 12.2.0 with the -O2 -march=skylake-avx512 options:
_Z25ComputeHiMaskAndHiZeroAmtj:
        movl    $32, %eax
        subl    %eax, %edi
        movl    $-1, %eax
        adcl    $0, %eax
        bzhi    %edi, %eax, %eax
        salq    $32, %rdi
        movl    %eax, %eax
        orq     %rdi, %rax
        ret

Here is a more optimal version of the above code (for 64-bit x86):
_Z25ComputeHiMaskAndHiZeroAmtj:
        subl    $32, %edi
        movl    $-1, %eax
        adcl    $0, %eax
        bzhi    %edi, %eax, %eax
        salq    $32, %rdi
        movl    %eax, %eax
        orq     %rdi, %rax
        ret

[Bug target/108614] New: _subborrow_u32 generates suboptimal code when second subtraction operand is constant on x86 targets

Reply via email to