https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354
--- Comment #3 from d_vampile <d_vampile at 163 dot com> --- (In reply to Andrew Pinski from comment #1) > First off the performance is difference is die to micro-arch issues with > unaligned stores of 256 bits. > > Also iirc rte_mov128blocks is tuned at copying blocks which are aligned at > least to 32 bytes wide. But you are better asking the dpdk forum why they > don't just use memcpy here. The instruction 'movdqu' do not require the memory address to be aligned on a natural vector-length byte boundary. Why does rte_mov128blocks need to be aligened at 32 bytes wide? The test platform is Xeon.