https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65368
Bug ID: 65368
Summary: _bzhi_u32 intrinsic generates incorrect code when -O1
or above is specified and index is an immediate
Product: gcc
Version: 4.9.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jamrial at gmail dot com
The code generated is a simple AND instruction that zeroes the high bits based
on the index value starting from the highest bit rather than the lowest.
Sample program:
------
#include <stdio.h>
#include <x86intrin.h>
int main(int argc, char **argv)
{
unsigned int j = _bzhi_u32(atoi(argv[1]), 11);
printf("%x\n", j);
return 0;
}
------
compiled with "gcc -O1 -mbmi2 -std=c99 -o bzhi bzhi.c"
[jamrial@ArchVM~]$ ./bzhi 4294967295
1fffff
Disassemble is as follows
0000000000000000 <main>:
0: 48 83 ec 08 sub rsp,0x8
4: 48 8b 7e 08 mov rdi,QWORD PTR [rsi+0x8]
8: ba 0a 00 00 00 mov edx,0xa
d: be 00 00 00 00 mov esi,0x0
12: e8 00 00 00 00 call 17 <main+0x17>
17: 89 c6 mov esi,eax
19: 81 e6 ff ff 1f 00 and esi,0x1fffff
1f: bf 00 00 00 00 mov edi,0x0
24: b8 00 00 00 00 mov eax,0x0
29: e8 00 00 00 00 call 2e <main+0x2e>
2e: b8 00 00 00 00 mov eax,0x0
33: 48 83 c4 08 add rsp,0x8
37: c3 ret
compiled with "gcc -mbmi2 -std=c99 -o bzhi bzhi.c"
[jamrial@ArchVM~]$ ./bzhi 4294967295
7ff
Disassemble is as follows
0000000000000000 <main>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 48 83 ec 20 sub rsp,0x20
8: 89 7d ec mov DWORD PTR [rbp-0x14],edi
b: 48 89 75 e0 mov QWORD PTR [rbp-0x20],rsi
f: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20]
13: 48 83 c0 08 add rax,0x8
17: 48 8b 00 mov rax,QWORD PTR [rax]
1a: 48 89 c7 mov rdi,rax
1d: e8 00 00 00 00 call 22 <main+0x22>
22: 89 45 f8 mov DWORD PTR [rbp-0x8],eax
25: c7 45 f4 0b 00 00 00 mov DWORD PTR [rbp-0xc],0xb
2c: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc]
2f: c4 e2 78 f5 45 f8 bzhi eax,DWORD PTR [rbp-0x8],eax
35: 89 45 fc mov DWORD PTR [rbp-0x4],eax
38: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
3b: 89 c6 mov esi,eax
3d: bf 00 00 00 00 mov edi,0x0
42: b8 00 00 00 00 mov eax,0x0
47: e8 00 00 00 00 call 4c <main+0x4c>
4c: b8 00 00 00 00 mov eax,0x0
51: c9 leave
52: c3 ret
When the index is not an immediate the actual bzhi instruction is always used
(with or without -Ox flags), and the result is the expected one.