On Wed, Jun 03, 2026 at 04:01:50PM +0800, Li Zhe wrote: > Introduce a generic memcpy_streaming() interface for write-once copy > sites that can fall back to memcpy() when no architecture-specific > optimization is available, or when an architecture-specific backend > cannot safely handle a given transfer. > > Add memcpy_streaming_drain() alongside it so callers can separate the > copy primitive from any required ordering point. On x86, use > memcpy_flushcache() and sfence only for aligned transfers that can stay > entirely on the non-temporal store path; otherwise fall back to memcpy()
So you throwing "streaming", "non-temporal" and "flush-cache" wildly around here and this is adding unnecessary confusion where it shouldn't. I'd suggest you stick to "non-temporal" which you can abbreviate short'n'sweet to "nt" and that's it. Keep it simple. > so the generic API does not expose flushcache semantics on cached > head/tail fragments. > > Callers are responsible for invoking memcpy_streaming_drain() before > later normal stores that must be ordered after the streaming copy. > > Signed-off-by: Li Zhe <[email protected]> > --- > arch/x86/include/asm/string_64.h | 32 ++++++++++++++++++++++++++++++++ > include/linux/string.h | 20 ++++++++++++++++++++ > 2 files changed, 52 insertions(+) > > diff --git a/arch/x86/include/asm/string_64.h > b/arch/x86/include/asm/string_64.h > index 4635616863f5..aee63108577f 100644 > --- a/arch/x86/include/asm/string_64.h > +++ b/arch/x86/include/asm/string_64.h There's arch/x86/include/asm/string.h. Why are those here, in the _64 variant? > @@ -4,6 +4,7 @@ > > #ifdef __KERNEL__ > #include <linux/jump_label.h> > +#include <linux/align.h> > > /* Written 2002 by Andi Kleen */ > > @@ -100,6 +101,37 @@ static __always_inline void memcpy_flushcache(void *dst, > const void *src, size_t > } > __memcpy_flushcache(dst, src, cnt); > } > + > +/* > + * Only map memcpy_streaming() to memcpy_flushcache() when the destination > + * is already 8-byte aligned and the size can be handled without cached > + * head/tail fragments in __memcpy_flushcache(). > + */ > +static __always_inline bool memcpy_flushcache_nt_safe(const void *dst, > + size_t cnt) This is checking alignment. Then call it that. > +{ > + unsigned long d = (unsigned long)dst; Useless. > + > + return cnt && IS_ALIGNED(d, 8) && IS_ALIGNED(cnt, 4); > +} AFAICT, this helper is used only once. Zap it completely. > + > +#define __HAVE_ARCH_MEMCPY_STREAMING 1 > +static __always_inline void memcpy_streaming(void *dst, const void *src, memcpy_nt() > + size_t cnt) > +{ > + if (!cnt) > + return; > + > + if (memcpy_flushcache_nt_safe(dst, cnt)) That branch can cost. Why is that alignment checking so necessary? Why can't you simply DTRT by handling the misaligned parts like __memcpy_flushcache(). What does this bring you? None of that is explained in the commit message so why do I want this patch at all? The commit message is basically telling me what the patch does but I can kinda read that from the diff itself. What it is not telling me is *why* it exists. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

