El 4/10/24 a las 20:37, Bill Allombert escribió:
The test suite works with 3th generation AMD EPYC processorss, but fails with 4th generation.Ah! Does the VM set /proc/cpuinfo correctly ?
I don't know. Is there an easy way to check that "/proc/cpuinfo is correct"?
Some years ago, I have seen problem when VM advertised support for CPU features. that where not actually supported.
That would be certainly surprising from AWS, I would suppose they have good engineers for such things.
/usr/bin/mlucas is actually a shellscript that try the various binaries in /usr/libexec/mlucas/ until it find one that works on the machine. by doing /usr/libexec/mlucas/mlucas-avx512 -fftlen 192 -iters 100 -radset 0 /usr/libexec/mlucas/mlucas-avx2 -fftlen 192 -iters 100 -radset 0 /usr/libexec/mlucas/mlucas-avx -fftlen 192 -iters 100 -radset 0 /usr/libexec/mlucas/mlucas-sse2 -fftlen 192 -iters 100 -radset 0 ... until this succeeds. Could you tell me what you get on epyc 3 and epyc 4 ?
Sure. On a c6a.large instance (EPYC 3th generation) the first one fails with exit status 1 and this error: INFO: testing qfloat routines... System total RAM = 3825, free RAM = 193 INFO: 193 MB of free system RAM detected. CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 14.2.0. has_avx512: CPUID returns [a,b,c,d] = [ A00F11, 20800,FEFA3203,178BFBFF] #define USE_AVX512 invoked but no FMA support detected on this CPU! Check get_cpuid functionality and CPU type. ERROR: at line 2079 of file upstream/src/util.c Assertion failed: #define USE_AVX512 invoked but no FMA support detected on this CPU! Check get_cpuid functionality and CPU type. so the following one (mlucas-avx2) is chosen. On a c7a.large instance (EPYC 4th generation), the first one exits with exit status 0, but then: "mlucas -s m" segfaults, and creates a core which I've just put here in case it helps: https://people.debian.org/~sanvila/build-logs/mlucas/core.gz (gdb says "Core was generated by `/usr/libexec/mlucas/mlucas-avx512 -s m'") (Beware: It's 956 MB big when uncompressed). I attach the contents of /proc/cpuinfo for the c6a.large and c7a.large instances.
Maybe one of those is broken and I can just remove it. (I already disabled sse2 on i386 due to crashes)
Looks reasonable, since this is also a crash. Thanks.
processor : 0 vendor_id : AuthenticAMD cpu family : 25 model : 1 model name : AMD EPYC 7R13 Processor stepping : 1 microcode : 0xa0011d5 cpu MHz : 2649.998 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass srso bogomips : 5299.99 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 1 vendor_id : AuthenticAMD cpu family : 25 model : 1 model name : AMD EPYC 7R13 Processor stepping : 1 microcode : 0xa0011d5 cpu MHz : 2649.998 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext invpcid_single ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass srso bogomips : 5299.99 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management:
processor : 0 vendor_id : AuthenticAMD cpu family : 25 model : 17 model name : AMD EPYC 9R14 stepping : 1 microcode : 0xa101148 cpu MHz : 2599.998 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core invpcid_single ssbd perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr rdpru wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid flush_l1d bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso bogomips : 5199.99 TLB size : 3584 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: processor : 1 vendor_id : AuthenticAMD cpu family : 25 model : 17 model name : AMD EPYC 9R14 stepping : 1 microcode : 0xa101148 cpu MHz : 2599.998 cache size : 1024 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 16 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext perfctr_core invpcid_single ssbd perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 clzero xsaveerptr rdpru wbnoinvd arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid flush_l1d bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso bogomips : 5199.99 TLB size : 3584 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: