https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114560
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Meirav Grimberg from comment #2) > (In reply to Jakub Jelinek from comment #1) > > AVX512BW is needed to be able to use __mmask32/__mmask64, those aren't > > supported in AVX512F, which only supports __mmask16. __mmask8 needs > > AVX512DQ (though, guess for that one one can just use KMOV with 16-bit > > mask). > > In GCC 13 and later, -mavx512bw has been added as the implicit requirement > > of > > -mavx512vbmi2 > > https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615906.html > > and -mavx512bitalg > > https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615905.html > > Hi, > thank you for the quick reply. > > As i mentioned Intel Intrinsics Guide specifically specifies only the > AVX512_VBMI2 flag without referencing AVX512BW. Could you shed some light on > this? That is just a bug in the Intrinsic Guide IMNSHO. > Moreover, I noticed that both Clang and Intel's compiler allow compilation > without additional flags, suggesting an implementation that aligns with the > hardware requirements. Could you provide insights into why GCC necessitates > an additional flag? The intrinsic needs to load the 32-bit mask into one of the %k{0,1,2,3,4,5,6,7} registers. And without AVX512BW there is just not an instruction for that. If you'll compile your testcase with clang with -O0 -mavx512vbmi2, you can see kmovd %ecx, %k1 instruction, which requires AVX512BW CPUID. So, supposedly it does what GCC14 and later does, enabling -mavx512bw implicitly when -mavx512vbmi2 is requested. While the vpexpandw instruction indeed maybe only needs AVX512VBMI2, you can't implement the intrinsic without AVX512BW. When I check clang -E -dD -mavx512vbmi2 output on godbolt, I see #define __AVX512BW__ 1 #define __AVX512VBMI2__ 1 defined there. > Regarding the term "implicit requirement," could you please clarify its > meaning? I didn't observe any apparent differences when attempting > compilation with GCC 13. Ah, sorry, it is indeed in GCC 14 only. I was misled by the commit date of January 2023, but it has been actually pushed into GCC trunk only in April after GCC 13 branched. In GCC 11-13 you need to use both -mavx512vbmi2 -mavx512bw to use these intrinsics.