https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011
Yuri Rumyantsev <ysrumyan at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ysrumyan at gmail dot com
--- Comment #6 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
I don't see any issues with 'false dependency' on HSW. I've got sep data on it:
for unsigned veriant (with LEA instructions):
0x400b30 52 161 lea 0x1(%rdx),%ecx
0x400b33 53 0 popcnt (%rbx,%rax,8),%rax
0x400b39 54 353 lea 0x2(%rdx),%r8d
0x400b3d 55 0 popcnt (%rbx,%rcx,8),%rcx
0x400b43 56 170 add %rax,%rcx
0x400b46 57 25 lea 0x3(%rdx),%esi
0x400b49 58 332 popcnt (%rbx,%r8,8),%rax
0x400b4f 59 196 add %rax,%rcx
0x400b52 60 199 popcnt (%rbx,%rsi,8),%rax
0x400b58 61 235 add %rax,%rcx
0x400b5b 62 414 lea 0x4(%rdx),%eax
0x400b5e 63 0 add %rcx,%r14
0x400b61 64 312 mov %rax,%rdx
0x400b64 65 0 cmp %rax,%r12
0x400b67 66 0 ja 400b30 <main+0xb0>
and we don't see any performance anomaly with popcnt.
But for 2nd loop we have
0x400c50 118 0 popcnt -0x8(%rdx),%rax
0x400c56 119 0 popcnt (%rdx),%rcx
0x400c5b 120 1086 add %rax,%rcx
0x400c5e 121 492 popcnt 0x8(%rdx),%rax
0x400c64 122 3 add %rcx,%rax
0x400c67 123 507 add $0x20,%rdx
0x400c6b 124 0 popcnt -0x10(%rdx),%rcx
0x400c71 125 955 add %rax,%rcx
0x400c74 126 479 add %rcx,%r13
0x400c77 127 489 cmp %rsi,%rdx
0x400c7a 128 0 jne 400c50 <main+0x1d0>
So far I can't imagine what the problem is.