On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle <[email protected]> wrote:
>
>
> This patch tweaks the way GCC handles 32-bit integer division on
> x86_64, when the numerator is constant. Currently the function
>
> int foo (int x) {
> return 100/x;
> }
>
> generates the code:
> foo: movl $100, %eax
> cltd
> idivl %edi
> ret
>
> where the sign-extension instruction "cltd" creates a long
> dependency chain, as it depends on the "mov" before it, and
> is depended upon by "idivl" after it.
>
> With this patch, GCC now matches both icc and LLVM and
> uses an xor instead, generating:
> foo: xorl %edx, %edx
> movl $100, %eax
> idivl %edi
> ret
You made me lookup idiv and I figured we're not optimally
handling
int foo (long x, int y)
{
return x / y;
}
by using a 32:32 / 32 bit divide. combine manages to
see enough to eventually do this though.
> Microbenchmarking confirms that this is faster on Intel
> processors (Kaby lake), and no worse on AMD processors (Zen2),
> which agrees with intuition, but oddly disagrees with the
> llvm-mca cycle count prediction on godbolt.org.
>
> The tricky bit is that this sign-extension instruction is only
> produced by late (postreload) splitting, and unfortunately none
> of the subsequent passes (e.g. cprop_hardreg) is able to
> propagate and simplify its constant argument. The solution
> here is to introduce a define_insn_and_split that allows the
> constant numerator operand to be captured (by combine) and
> then split into an optimal form after reload.
>
> The above microbenchmarking also shows that eliminating the
> sign extension of negative values (using movl $-1,%edx) is also
> a performance improvement, as performed by icc but not by LLVM.
> Both the xor and movl sign-extensions are larger than cltd,
> so this transformation is prevented for -Os.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-07-08 Roger Sayle <[email protected]>
>
> gcc/ChangeLog
> * config/i386/i386.md (*divmodsi4_const): Optimize SImode
> divmod of a constant numerator with new define_insn_and_split.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/divmod-9.c: New test case.
>
>
> Roger
> --
> Roger Sayle
> NextMove Software
> Cambridge, UK
>