https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82618
Bug ID: 82618
Summary: Inefficient double-word subtration on x86_64
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jakub at gcc dot gnu.org
Target Milestone: ---
If we only need the upper word from TImode subtraction, we could emit
cmpq %rdx, %rdi
sbbq %rcx, %rsi
movq %rsi, %rax
but we actually emit:
movq %rdi, %r9
movq %rsi, %r10
subq %rdx, %r9
sbbq %rcx, %r10
movq %r10, %rax
on the following testcase:
#ifdef __SIZEOF_INT128__
typedef unsigned __int128 U;
typedef unsigned long long H;
#else
typedef unsigned long long U;
typedef unsigned int H;
#endif
H
f0 (U x, U y)
{
return (x - y) >> (__CHAR_BIT__ * sizeof (H));
}
For this testcase unfortunately we keep the TImode subtraction until after RA
and split it during split2. Not sure if at that point there is something that
can still be done (whether there is a way to find that one subword is dead and
split differently in that case). Or peephole2 to fix the effects (notice subq
with dead destination, turn it into corresponding compare, and if there is a
move from some register to a register dead after this insn, eliminate the move
too)?