On Tue, Jun 10, 2025 at 01:02:12PM +0000, Varghese, Vipin wrote: > [Public] > > Hi Bruce, > > Snipped > > > > > > > > > > > > > Hi Bruce, we have reviewed this internally and tested the same. > > > > > > We would like > > > > > your thought for the following. > > > > > > > > > > > > - Before patch: we were directly setting AVX512 falgs for F, BW, > > > > > > DQ, VL > > > > > > - new patch: we are setting the flags for `skylake-server` as bare > > > > > > minimal. > > > > > > - AMD supports AVX512 from `znver4 and higher`. > > > > > > > > > > > > As per GCC > > > > > > `https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html`, the extra > > > > > > ISA > > > > > supported between skylake-server (super set) and znver4 and znver5 > > > > > are `SAHF, FXSR, XSAVE, RDRND, LZCNT, HLE, PREFETCHW, SGX`. > > > > > > Currently for DPDK microbenchmarks and examples runs safe as it > > > > > > is not using > > > > > the `SAHF, FXSR, XSAVE, RDRND, LZCNT, HLE, PREFETCHW, SGX` > > > > > instructions. > > > > > > > > > > > > Question: should we check if target is `AMD EPYC` then apply > > > > > > bare minimum as > > > > > `-march=znver4`, thus avoid possible unsupported instruction > > > > > generation when non `c_args for march` is passed? > > > > > > > > > > > > > > > > Can you clarify why you mean by the "target" here? Is there a > > > > > specific value you are thinking of for the "cpu_instruction_set" > > > > > option? > > > > > > > > `Target` is target CPU, when generated without any arguments we get > > > > code for > > `native build`. > > > > > > > > On AMD target cpu zen4 or zen5; Before patch as per the code ` AVX512 > > > > flags > > for F, BW, DQ` are used in ` cc_avx512_flags`. > > > > > > > > With the patch, the cc_avx512_flags is set to `-march=skylake-avx512` > > > > (where > > compiler optimizations `can add HLE, PREFETCHW, SGX`). > > > > > > > > > > With this patch for zen4 or zen5 the AVX512 code paths should be > > > compiled with no additional flags, since the -march=zen* flag should > > > include everything necessary. Can you confirm that you see the extra > > > -march flag in those cases? > > > > > > > Ran a quick test myself, this is what I see, doing a zen4 build: > > > > $ meson setup -Dcpu_instruction_set=znver4 build-zen4 > > The Meson build system > > Version: 1.7.0 > > Source dir: /home/bruce/dpdk-github > > Build dir: /home/bruce/dpdk-github/build-zen4 > > ... > > Fetching value of define "__AVX512F__" : 1 > > Message: Extra C flags needed for AVX2 output: [] > > Message: Extra C flags needed for AVX512 output: [] > > ... > > > > Checking the build.ninja file in the build-zen4 directory, there is no use > > of > > march=skylake. Here is the compilation recipe for i40e AVX-512 code, for > > example: > > > > build > > drivers/librte_net_i40e_avx512_lib.a.p/net_intel_i40e_i40e_rxtx_vec_avx512.c.o: > > c_COMPILER ../drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c > > DEPFILE = > > drivers/librte_net_i40e_avx512_lib.a.p/net_intel_i40e_i40e_rxtx_vec_avx512.c.o.d > > DEPFILE_UNQUOTED = > > drivers/librte_net_i40e_avx512_lib.a.p/net_intel_i40e_i40e_rxtx_vec_avx512.c.o.d > > ARGS = -Idrivers/librte_net_i40e_avx512_lib.a.p -Idrivers -I../drivers - > > Idrivers/net/intel/i40e -I../drivers/net/intel/i40e > > -Idrivers/net/intel/i40e/base - > > I../drivers/net/intel/i40e/base -Ilib/ethdev -I../lib/ethdev > > -Ilib/eal/common - > > I../lib/eal/common -I. -I.. -Iconfig -I../config -Ilib/eal/include > > -I../lib/eal/include - > > Ilib/eal/linux/include -I../lib/eal/linux/include -Ilib/eal/x86/include > > -I../lib/eal/x86/include > > -I../kernel/linux -Ilib/eal -I../lib/eal -Ilib/kvargs -I../lib/kvargs > > -Ilib/log -I../lib/log - > > Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -Ilib/net > > -I../lib/net -Ilib/mbuf - > > I../lib/mbuf -Ilib/mempool -I../lib/mempool -Ilib/ring -I../lib/ring > > -Ilib/meter -I../lib/meter - > > Idrivers/bus/pci -I../drivers/bus/pci -I../drivers/bus/pci/linux -Ilib/pci > > -I../lib/pci - > > Idrivers/bus/vdev -I../drivers/bus/vdev -Ilib/hash -I../lib/hash -Ilib/rcu > > -I../lib/rcu - > > I/usr/include/x86_64-linux-gnu -fdiagnostics-color=always - > > D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=c11 -O3 -include > > rte_config.h -Wvla -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral - > > Wformat-security -Wmissing-declarations -Wmissing-prototypes > > -Wnested-externs > > -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes > > -Wundef > > -Wwrite-strings -Wno-packed-not-aligned -Wno-missing-field-initializers - > > D_GNU_SOURCE -fPIC -march=znver4 -mrtm -DALLOW_EXPERIMENTAL_API > > -DALLOW_INTERNAL_API -Wno-format-truncation -Wno-address-of-packed- > > member -DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.i40e - > > DCC_AVX512_SUPPORT > > I have retried the patch and tested the same on 2 scenarios > > 1. Scenario-1: znver4 (avx512) with gcc 13.2 > 2. Scenario-2: znver3 (no avx512) with gcc 13.2 > > For scenario-2 where CPU is not supporting avx512 but compiler supports > avx512 I get the logs as > > ``` > Fetching value of define "__AVX512F__" : > Message: found AVX512 > Message: ['-march=skylake-avx512'] > Compiler for C supports arguments -Wno-overriding-option: NO > Message: Extra C flags needed for AVX2 output: [] > Message: Extra C flags needed for AVX512 output: ['-march=skylake-avx512'] > > > Message: AVX 512 flag in drivers > Message: ['-march=skylake-avx512'] > Message: ['-march=native', '-mrtm', '-DALLOW_EXPERIMENTAL_API', > '-DALLOW_INTERNAL_API', '-Wno-format-truncation', > '-Wno-address-of-packed-member', '-DDLB2_BYPASS_FENCE_ON_PP=0', > '-DDLB_HW_CREDITS_CHECKS=1', '-DDLB_SW_CREDITS_CHECKS=1', > '-DDLB_TYPE_CHECK=1', '-DRTE_LOG_DEFAULT_LOGTYPE=pmd.event.dlb2', > '-DCC_AVX512_SUPPORT'] > ``` > > The above is correct as per my understanding as the test is to compile for > avx512 ISA. > But When using `ninja -C build -v` and building for native build, I get > > ``` > [1741/3539] cc -Idrivers/librte_net_i40e_avx512_lib.a.p -Idrivers > -I../drivers -Idrivers/net/intel/i40e -I../drivers/net/intel/i40e > -Idrivers/net/intel/i40e/base > -I../drivers/net/intel/i40e/base -Ilib/ethdev -I../lib/ethdev > -Ilib/eal/common -I../lib/eal/common -I. -I.. -Iconfig -I../config > -Ilib/eal/include > -I../lib/eal/include -Ilib/eal/linux/include -I../lib/eal/linux/include > -Ilib/eal/x86/include -I../lib/eal/x86/include -I../kernel/linux -Ilib/eal > -I../lib/eal -Ilib/kvargs -I../lib/kvargs -Ilib/log -I../lib/log > -Ilib/metrics -I../lib/metrics -Ilib/telemetry -I../lib/telemetry -Ilib/net > -I../lib/net -Ilib/mbuf -I../lib/mbuf -Ilib/mempool -I../lib/mempool > -Ilib/ring -I../lib/ring -Ilib/meter -I../lib/meter -Idrivers/bus/pci > -I../drivers/bus/pci > -I../drivers/bus/pci/linux -Ilib/pci -I../lib/pci -Idrivers/bus/vdev > -I../drivers/bus/vdev -Ilib/hash -I../lib/hash -Ilib/rcu -I../lib/rcu > -fdiagnostics-color=always > -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=c11 -O3 -include > rte_config.h -Wvla -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral > -Wformat-security > -Wmissing-declarations -Wmissing-prototypes -Wnested-externs > -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes > -Wundef -Wwrite-strings > -Wno-packed-not-aligned -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC > -march=native -mrtm -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API > -Wno-format-truncation > -Wno-address-of-packed-member -DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.i40e > -DCC_AVX512_SUPPORT -march=skylake-avx512 -MD > -MQ > drivers/librte_net_i40e_avx512_lib.a.p/net_intel_i40e_i40e_rxtx_vec_avx512.c.o > -MF > drivers/librte_net_i40e_avx512_lib.a.p/net_intel_i40e_i40e_rxtx_vec_avx512.c.o.d > -o > drivers/librte_net_i40e_avx512_lib.a.p/net_intel_i40e_i40e_rxtx_vec_avx512.c.o > -c ../drivers/net/intel/i40e/i40e_rxtx_vec_avx512.c > ``` > > In above log I get `2 instances of march`; logs `-march=native -mrtm > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -Wno-format-truncation > -Wno-address-of-packed-member -DRTE_LOG_DEFAULT_LOGTYPE=pmd.net.i40e > -DCC_AVX512_SUPPORT -march=skylake-avx512`. > > Question-1: I think this is not expected right? The `-march=native` is > populated from `cflags` and `-march= skylake-avx512` is populated from ` > cc_avx512_flags`.
The above command is correct. So long as the compiler supports AVX-512 we will always compile the AVX-512 code paths for runtime selection. In practice, all supported compilers have AVX-512 support, so in reality we have the two scenarios you tested: * The target architecture e.g. znver3 in your case, doesn't support avx512, so the meson.build file adds on the necessary flags to add this support, i.e. that file is compiled with -march=skylake=avx512, which is the minimum ISA that gives you the necessary support. * The target architecture, e.g. znver4, does support AVX-512, then no additional flags are added and the files are compiled "as normal" In both these cases, whether the target architecture is specified as "native" or explicitly makes no difference. > Question-2: if the target is meet minimal ISA why not we use > `-march=x86-64-v4`? > Good point, that would indeed be better. I'm just not sure whether it is supported widely enough on our compilers. Do you know what gcc and clang versions support that target? > Note: I am yet to check for cross build. Will update on cross build how this > comes out. > > > > > > > Regards, > > /Bruce