anna added a comment. In D155145#4551621 <https://reviews.llvm.org/D155145#4551621>, @craig.topper wrote:
> In D155145#4551526 <https://reviews.llvm.org/D155145#4551526>, @anna wrote: > >> In D155145#4544068 <https://reviews.llvm.org/D155145#4544068>, @pengfei >> wrote: >> >>> In D155145#4543326 <https://reviews.llvm.org/D155145#4543326>, @anna wrote: >>> >>>> We see a crash bisected to this patch about using an illegal instruction. >>>> Here's the CPUInfo for the machine: >>>> >>>> CPU info: >>>> current cpu id: 22 >>>> total 32(physical cores 16) (assigned logical cores 32) (assigned >>>> physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads >>>> per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, >>>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, >>>> clmul, ht, tsc, tscinvbit, tscinv, clflush >>>> AvgLoads: 0.30, 0.10, 0.18 >>>> CPU Model and flags from /proc/cpuinfo: >>>> model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz >>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge >>>> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall >>>> nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl >>>> xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl >>>> vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt >>>> tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp >>>> tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts >>>> md_clear flush_l1d >>>> Online cpus: 0-31 >>>> Offline cpus: >>>> BIOS frequency limitation: <Not Available> >>>> Frequency switch latency (ns): 20000 >>>> Available cpu frequencies: <Not Available> >>>> Current governor: schedutil >>>> Core performance/turbo boost: <Not Available> >>>> >>>> I don't see `avxvnniint16` in the flags list nor avx2. So, this >>>> (relatively new) instruction shouldn't be generated for this machine. Any >>>> ideas on why this might be happening? >>> >>> As far as I can see from the patch, the only way to generate avxvnniint16 >>> instructions is to call its specific intrinsics explicitly. And we will >>> check compiling options in FE before allowing to call the intrinsics. We do >>> have an optimization to generate vnni instructions without intrinsics, but >>> we haven't extend it to avxvnniint16 so far. >>> So I don't know what's wrong in your case, could you provide a reproducer >>> for your problem? >> >> I've investigated what is going on. With this patch, we are now passing in >> `+avxvnniint16` into machine attributes. With that attribute, we now >> generate an instruction which is illegal on sandybridge machine: >> >> 0x3013f2af: jmpq 0x3013f09b >> 0x3013f2b4: mov %rax,%rdi >> 0x3013f2b7: and $0xfffffffffffffff0,%rdi >> => 0x3013f2bb: vpbroadcastd %xmm0,%ymm2 >> 0x3013f2c0: vpbroadcastd %xmm1,%ymm3 >> >> The instruction `vpbroadcastd %xmm0,%ymm2` requires `AVX2` CPU flag: >> https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has only >> AVX flag. >> >> This is the complete mattr generated: >> >> !3 = >> !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"} >> >> I've confirmed if we changed to `-avxvnniint16` we do not generate >> `vpbroadcastd`. >> >> W.r.t. how we get the machine attributes generated through our front-end: >> >> if (!sys::getHostCPUFeatures(Features)) >> return std::move(mattr); >> >> // Fill mattr with default values. >> mattr.reserve(Features.getNumItems()); >> for (auto &I : Features) { >> std::string attr(I.first()); >> mattr.emplace_back(std::string(I.second ? "+" : "-") + attr); >> } >> >> So, the problem is in getHostCPUFeatures, possibly this line from the patch >> : >> `Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && >> HasAVXSave;`. > > Does this patch help > > diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp > index 1141df09307c..11a6879fb76a 100644 > --- a/llvm/lib/TargetParser/Host.cpp > +++ b/llvm/lib/TargetParser/Host.cpp > @@ -1769,7 +1769,7 @@ bool sys::getHostCPUFeatures(StringMap<bool> > &Features) { > Features["amx-tile"] = HasLeaf7 && ((EDX >> 24) & 1) && HasAMXSave; > Features["amx-int8"] = HasLeaf7 && ((EDX >> 25) & 1) && HasAMXSave; > bool HasLeaf7Subleaf1 = > - MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, > &EDX); > + HasLeaf7 && EAX >= 1 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, > &ECX, &EDX); > Features["sha512"] = HasLeaf7Subleaf1 && ((EAX >> 0) & 1); > Features["sm3"] = HasLeaf7Subleaf1 && ((EAX >> 1) & 1); > Features["sm4"] = HasLeaf7Subleaf1 && ((EAX >> 2) & 1); Yes, @craig.topper that works! Thanks! Could you pls land the patch, if possible? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155145/new/ https://reviews.llvm.org/D155145 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits