------- Comment #3 from svfuerst at gmail dot com 2010-04-30 16:12 ------- Oops, you are right. The 128 bit version needs an extra sbb on the end with that code. (For some reason I was missreading the shr as a sar.):
mov %rsi,%rdx shr $0x3f,%rdx lea (%rdi,%rdx,1),%rax and $0x1,%eax sub %rdx,%rax sbb %rdx,%rdx However, if you use sar + add, instead of shr + sub + sbb, it is one instruction less: mov %rsi,%rdx sar $0x3f,%rdx lea (%rdi,%rdx,1),%rax and $0x1,%eax add %rdx,%rax -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883