https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65368
Bug ID: 65368 Summary: _bzhi_u32 intrinsic generates incorrect code when -O1 or above is specified and index is an immediate Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jamrial at gmail dot com The code generated is a simple AND instruction that zeroes the high bits based on the index value starting from the highest bit rather than the lowest. Sample program: ------ #include <stdio.h> #include <x86intrin.h> int main(int argc, char **argv) { unsigned int j = _bzhi_u32(atoi(argv[1]), 11); printf("%x\n", j); return 0; } ------ compiled with "gcc -O1 -mbmi2 -std=c99 -o bzhi bzhi.c" [jamrial@ArchVM~]$ ./bzhi 4294967295 1fffff Disassemble is as follows 0000000000000000 <main>: 0: 48 83 ec 08 sub rsp,0x8 4: 48 8b 7e 08 mov rdi,QWORD PTR [rsi+0x8] 8: ba 0a 00 00 00 mov edx,0xa d: be 00 00 00 00 mov esi,0x0 12: e8 00 00 00 00 call 17 <main+0x17> 17: 89 c6 mov esi,eax 19: 81 e6 ff ff 1f 00 and esi,0x1fffff 1f: bf 00 00 00 00 mov edi,0x0 24: b8 00 00 00 00 mov eax,0x0 29: e8 00 00 00 00 call 2e <main+0x2e> 2e: b8 00 00 00 00 mov eax,0x0 33: 48 83 c4 08 add rsp,0x8 37: c3 ret compiled with "gcc -mbmi2 -std=c99 -o bzhi bzhi.c" [jamrial@ArchVM~]$ ./bzhi 4294967295 7ff Disassemble is as follows 0000000000000000 <main>: 0: 55 push rbp 1: 48 89 e5 mov rbp,rsp 4: 48 83 ec 20 sub rsp,0x20 8: 89 7d ec mov DWORD PTR [rbp-0x14],edi b: 48 89 75 e0 mov QWORD PTR [rbp-0x20],rsi f: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20] 13: 48 83 c0 08 add rax,0x8 17: 48 8b 00 mov rax,QWORD PTR [rax] 1a: 48 89 c7 mov rdi,rax 1d: e8 00 00 00 00 call 22 <main+0x22> 22: 89 45 f8 mov DWORD PTR [rbp-0x8],eax 25: c7 45 f4 0b 00 00 00 mov DWORD PTR [rbp-0xc],0xb 2c: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 2f: c4 e2 78 f5 45 f8 bzhi eax,DWORD PTR [rbp-0x8],eax 35: 89 45 fc mov DWORD PTR [rbp-0x4],eax 38: 8b 45 fc mov eax,DWORD PTR [rbp-0x4] 3b: 89 c6 mov esi,eax 3d: bf 00 00 00 00 mov edi,0x0 42: b8 00 00 00 00 mov eax,0x0 47: e8 00 00 00 00 call 4c <main+0x4c> 4c: b8 00 00 00 00 mov eax,0x0 51: c9 leave 52: c3 ret When the index is not an immediate the actual bzhi instruction is always used (with or without -Ox flags), and the result is the expected one.