12 regression] The instructions of the DPDK demo program are different and run time increases.

d_vampile at 163 dot com via Gcc-bugs Fri, 08 Sep 2023 23:39:40 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354


--- Comment #3 from d_vampile <d_vampile at 163 dot com> ---
(In reply to Andrew Pinski from comment #1)
> First off the performance is difference is die to micro-arch issues with
> unaligned stores of 256 bits. 
> 
> Also iirc rte_mov128blocks is tuned at copying blocks which are aligned at
> least to 32 bytes wide. But you are better asking the dpdk forum why they
> don't just use memcpy here.

The instruction 'movdqu' do not require the memory address to be aligned on a
natural vector-length byte boundary. Why does rte_mov128blocks need to be
aligened at 32 bytes wide? 

The test platform is Xeon.

[Bug target/111354] [7/10/12 regression] The instructions of the DPDK demo program are different and run time increases.

Reply via email to