https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94174
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #5)
> (In reply to Richard Henderson from comment #2)
> > Case 3:
> >
> > void test3(__int128 a, unsigned long l)
> > {
> > if ((__int128_t)a - l <= 1)
> > doit();
> > }
> >
> > Note that clang attempts a branchless double-word comparison
> >
> > subs x8, x0, x2
> > sbcs x9, x1, xzr
> > cmp x8, #1
> > cset w8, hi
> > cmp x9, #0
> > cset w9, gt
> > csel w8, w8, w9, eq
> > tbnz w8, #0, .LBB0_2
>
> LLVM now produces:
> subs x8, x0, x2
> mov w9, #1
> sbc x10, x1, xzr
> cmp x9, x8
> ngcs xzr, x10
> b.lt .LBB0_2
> b doit
With the patch for PR 116509, GCC now produces:
.cfi_startproc
subs x0, x0, x2
mov x2, 1
sbc x1, x1, xzr
cmp x2, x0
mov x3, 0
sbcs x3, x3, x1
bge .L4
ret
Which is almost there. there is one extra move. and sbcs stores to x3 rather
not to a register but it is close enough.