> From: Burakov, Anatoly [mailto:anatoly.bura...@intel.com]
> Sent: Thursday, 5 June 2025 11.29
> 
> On 6/4/2025 4:59 PM, Bruce Richardson wrote:
> > On Fri, May 30, 2025 at 02:57:19PM +0100, Anatoly Burakov wrote:
> >> Currently, for 32-byte descriptor format, only SSE instruction set
> is
> >> supported. Add implementation for AVX2 and AVX512 instruction sets.
> Since
> >> we are using Rx descriptor definitions from common code, we can just
> use
> >> the generic descriptor definition, as we only ever write the first
> 16 bytes
> >> of it, and the layout is always the same for that part.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
> >> ---
> >>
> >
> > Like the idea. Feedback inline below.
> >
> > /Bruce
> >
> 
> <snip>
> 
> >> -          /**
> >> -           * merge 0 & 1, by casting 0 to 256-bit and inserting 1
> >> -           * into the high lanes. Similarly for 2 & 3
> >> -           */
> >> -          const __m256i vaddr0_256 = _mm256_castsi128_si256(vaddr0);
> >> -          const __m256i vaddr2_256 = _mm256_castsi128_si256(vaddr2);
> >> +                  const __m128i vaddr0 = _mm_loadu_si128((const __m128i
> *)&mb0->buf_addr);
> >> +                  const __m128i vaddr1 = _mm_loadu_si128((const __m128i
> *)&mb1->buf_addr);
> >
> > Minor nit, but do we need to use unaligned loads here? The mbuf is
> marked
> > as cache-aligned, and buf_addr is the first field in it.
> 
> It was like that in the original code I think (unless it was a
> copypaste
> error), but sure, I can make it aligned.

I wonder if the compiler emits a warning if it can detect that the address of 
an aligned load instruction is not aligned.

As a safeguard against future mbuf changes, you could add a static_assert that 
the buf_addr field is properly aligned.

PS:
According to the Intel Intrinsics Guide [1], unaligned load appears to have the 
exact same performance as aligned load, so it shouldn't make a practical 
difference with current CPUs.
But it might make a difference on future CPUs, and with no need for unaligned 
load, I'm also in favor of aligned load here.

[1]: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
 

Reply via email to