https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82618

            Bug ID: 82618
           Summary: Inefficient double-word subtration on x86_64
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

If we only need the upper word from TImode subtraction, we could emit
        cmpq    %rdx, %rdi
        sbbq    %rcx, %rsi
        movq    %rsi, %rax
but we actually emit:
        movq    %rdi, %r9
        movq    %rsi, %r10
        subq    %rdx, %r9
        sbbq    %rcx, %r10
        movq    %r10, %rax
on the following testcase:

#ifdef __SIZEOF_INT128__
typedef unsigned __int128 U;
typedef unsigned long long H;
#else
typedef unsigned long long U;
typedef unsigned int H;
#endif

H
f0 (U x, U y)
{
  return (x - y) >> (__CHAR_BIT__ * sizeof (H));
}

For this testcase unfortunately we keep the TImode subtraction until after RA
and split it during split2.  Not sure if at that point there is something that
can still be done (whether there is a way to find that one subword is dead and
split differently in that case).  Or peephole2 to fix the effects (notice subq
with dead destination, turn it into corresponding compare, and if there is a
move from some register to a register dead after this insn, eliminate the move
too)?

Reply via email to