https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82618
Bug ID: 82618 Summary: Inefficient double-word subtration on x86_64 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org Target Milestone: --- If we only need the upper word from TImode subtraction, we could emit cmpq %rdx, %rdi sbbq %rcx, %rsi movq %rsi, %rax but we actually emit: movq %rdi, %r9 movq %rsi, %r10 subq %rdx, %r9 sbbq %rcx, %r10 movq %r10, %rax on the following testcase: #ifdef __SIZEOF_INT128__ typedef unsigned __int128 U; typedef unsigned long long H; #else typedef unsigned long long U; typedef unsigned int H; #endif H f0 (U x, U y) { return (x - y) >> (__CHAR_BIT__ * sizeof (H)); } For this testcase unfortunately we keep the TImode subtraction until after RA and split it during split2. Not sure if at that point there is something that can still be done (whether there is a way to find that one subword is dead and split differently in that case). Or peephole2 to fix the effects (notice subq with dead destination, turn it into corresponding compare, and if there is a move from some register to a register dead after this insn, eliminate the move too)?