https://sourceware.org/bugzilla/show_bug.cgi?id=22871
--- Comment #20 from Linus Torvalds <torva...@linux-foundation.org> --- I thought I could make the numbers more stable by using serializing instructions (cpuid with %eax=0) around the rdtsc, but that just caused some odd bi-modal behavior where testb/testl and testw/testq "pair up": Round 0 testb : 150868957 testw : 117736338 testl : 147902663 testq : 117523153 Round 1 testb : 146681110 testw : 118466921 testl : 147758755 testq : 118308050 Round 2 testb : 147607803 testw : 118383229 testl : 147303788 testq : 118873304 Round 3 testb : 147266141 testw : 121145806 testl : 151399470 testq : 116112309 Funky. But doing profiling, I notice that most of the cost is in the main() function, not the test functions, so I suspect it ends up being about just cacheline alignment of the test loops: in the fast cases, the loop start is 16-byte aligned, in the slow case it's 8-byte aligned. If I align everything on cacheline boundaries, things stabilize a lot: Round 0 testb : 146282055 testw : 145670901 testl : 147173973 testq : 146631984 Round 1 testb : 149647175 testw : 145634421 testl : 145738496 testq : 150404114 Round 2 testb : 147685735 testw : 146392328 testl : 144992998 testq : 146145547 Round 3 testb : 145870460 testw : 146986702 testl : 145906570 testq : 146429161 -- You are receiving this mail because: You are on the CC list for the bug. _______________________________________________ bug-binutils mailing list bug-binutils@gnu.org https://lists.gnu.org/mailman/listinfo/bug-binutils