On Wed, Jun 03, 2026 at 04:01:50PM +0800, Li Zhe wrote:
> Introduce a generic memcpy_streaming() interface for write-once copy
> sites that can fall back to memcpy() when no architecture-specific
> optimization is available, or when an architecture-specific backend
> cannot safely handle a given transfer.
> 
> Add memcpy_streaming_drain() alongside it so callers can separate the
> copy primitive from any required ordering point. On x86, use
> memcpy_flushcache() and sfence only for aligned transfers that can stay
> entirely on the non-temporal store path; otherwise fall back to memcpy()

So you throwing "streaming", "non-temporal" and "flush-cache" wildly around
here and this is adding unnecessary confusion where it shouldn't. I'd suggest
you stick to "non-temporal" which you can abbreviate short'n'sweet to "nt" and
that's it. Keep it simple.

> so the generic API does not expose flushcache semantics on cached
> head/tail fragments.
> 
> Callers are responsible for invoking memcpy_streaming_drain() before
> later normal stores that must be ordered after the streaming copy.
> 
> Signed-off-by: Li Zhe <[email protected]>
> ---
>  arch/x86/include/asm/string_64.h | 32 ++++++++++++++++++++++++++++++++
>  include/linux/string.h           | 20 ++++++++++++++++++++
>  2 files changed, 52 insertions(+)
> 
> diff --git a/arch/x86/include/asm/string_64.h 
> b/arch/x86/include/asm/string_64.h
> index 4635616863f5..aee63108577f 100644
> --- a/arch/x86/include/asm/string_64.h
> +++ b/arch/x86/include/asm/string_64.h

There's arch/x86/include/asm/string.h. Why are those here, in the _64 variant?

> @@ -4,6 +4,7 @@
>  
>  #ifdef __KERNEL__
>  #include <linux/jump_label.h>
> +#include <linux/align.h>
>  
>  /* Written 2002 by Andi Kleen */
>  
> @@ -100,6 +101,37 @@ static __always_inline void memcpy_flushcache(void *dst, 
> const void *src, size_t
>       }
>       __memcpy_flushcache(dst, src, cnt);
>  }
> +
> +/*
> + * Only map memcpy_streaming() to memcpy_flushcache() when the destination
> + * is already 8-byte aligned and the size can be handled without cached
> + * head/tail fragments in __memcpy_flushcache().
> + */
> +static __always_inline bool memcpy_flushcache_nt_safe(const void *dst,
> +                                                   size_t cnt)

This is checking alignment. Then call it that.

> +{
> +     unsigned long d = (unsigned long)dst;

Useless.

> +
> +     return cnt && IS_ALIGNED(d, 8) && IS_ALIGNED(cnt, 4);
> +}

AFAICT, this helper is used only once. Zap it completely.

> +
> +#define __HAVE_ARCH_MEMCPY_STREAMING 1
> +static __always_inline void memcpy_streaming(void *dst, const void *src,

memcpy_nt()

> +                                          size_t cnt)
> +{
> +     if (!cnt)
> +             return;
> +
> +     if (memcpy_flushcache_nt_safe(dst, cnt))

That branch can cost. Why is that alignment checking so necessary? Why can't
you simply DTRT by handling the misaligned parts like __memcpy_flushcache().

What does this bring you? None of that is explained in the commit message so
why do I want this patch at all?

The commit message is basically telling me what the patch does but I can kinda
read that from the diff itself. What it is not telling me is *why* it exists.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Reply via email to