------- Comment #5 from ubizjak at gmail dot com 2010-04-30 19:00 ------- (In reply to comment #4) > Argh, the sar trick doesn't work when the number is negative and even. Sorry > about the extra noise. > > This leaves as the best code: > mov %rsi,%rdx > shr $0x3f,%rdx > lea (%rdi,%rdx,1),%rax > and $0x1,%eax > sub %rdx,%rax > sbb %rdx,%rdx > > This is still better than current version. Of course, changing the and > instruction will allow faster versions of x%4, x%8, x%16 etc.
Belive it or not, but the version that you show in the description is how gcc handles subregs... it starts OK, but when register allocator comes into play... Confirmed as RA problem, the same thing happens with "long long" and -m32. -- ubizjak at gmail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ubizjak at gmail dot com Status|UNCONFIRMED |NEW Component|target |middle-end Ever Confirmed|0 |1 Keywords| |ra Last reconfirmed|0000-00-00 00:00:00 |2010-04-30 19:00:41 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883