RE: [RFC v2] non-temporal memcpy

2022-10-09 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Tuesday, 9 August 2022 13.53 > > On 2022-08-09 11:24, Morten Brørup wrote: > >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > >> Sent: Sunday, 7 August 2022 22.41 > >> > >> On 2022-07-29 18:05, Stephen Hemminger wrote: > >>>

RE: [RFC v2] non-temporal memcpy

2022-08-11 Thread Honnappa Nagarahalli
> > > >> > >>> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > >>> Sent: Wednesday, 10 August 2022 13.56 > >>> > >>> On 2022-08-09 17:26, Stephen Hemminger wrote: > >> > >> [...] > >> > >>> > >>> Alignment seems like a non-issue to me. A NT-store memcpy() can be > >>> made free of alignme

RE: [RFC v2] non-temporal memcpy

2022-08-11 Thread Honnappa Nagarahalli
> >> > >> +TO: @Honnappa, we need input from ARM > >> > >>> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > >>> Sent: Friday, 29 July 2022 21.49 > > > From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > > Sent: Friday, 29 July 2022 14.14 > > >

Re: [RFC v2] non-temporal memcpy

2022-08-11 Thread Mattias Rönnblom
On 2022-08-10 23:20, Honnappa Nagarahalli wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Wednesday, 10 August 2022 13.56 On 2022-08-09 17:26, Stephen Hemminger wrote: [...] Alignment seems like a non-issue to me. A NT-store memcpy() can be made free of alignment req

Re: [RFC v2] non-temporal memcpy

2022-08-11 Thread Mattias Rönnblom
On 2022-08-10 23:05, Honnappa Nagarahalli wrote: +TO: @Honnappa, we need input from ARM From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] Sent: Friday, 29 July 2022 21.49 From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] Sent: Friday, 29 July 2022 14.14 Sorr

RE: [RFC v2] non-temporal memcpy

2022-08-10 Thread Honnappa Nagarahalli
> > > From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > > Sent: Wednesday, 10 August 2022 13.56 > > > > On 2022-08-09 17:26, Stephen Hemminger wrote: > > [...] > > > > > Alignment seems like a non-issue to me. A NT-store memcpy() can be > > made free of alignment requirements, incurring

RE: [RFC v2] non-temporal memcpy

2022-08-10 Thread Honnappa Nagarahalli
> > +TO: @Honnappa, we need input from ARM > > > From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > > Sent: Friday, 29 July 2022 21.49 > > > > > > > From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > > > > Sent: Friday, 29 July 2022 14.14 > > > > > > > > > > > > So

RE: [RFC v2] non-temporal memcpy

2022-08-10 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Wednesday, 10 August 2022 13.56 > > On 2022-08-09 17:26, Stephen Hemminger wrote: [...] > > Alignment seems like a non-issue to me. A NT-store memcpy() can be made > free of alignment requirements, incurring only a very slight cost

RE: [RFC v2] non-temporal memcpy

2022-08-10 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Wednesday, 10 August 2022 14.00 > > On 2022-08-09 19:24, Morten Brørup wrote: > >> From: Stephen Hemminger [mailto:step...@networkplumber.org] > >> Sent: Tuesday, 9 August 2022 17.26 > >> > >> On Tue, 9 Aug 2022 11:46:19 +0200 > >> Mo

Re: [RFC v2] non-temporal memcpy

2022-08-10 Thread Mattias Rönnblom
On 2022-08-09 19:24, Morten Brørup wrote: From: Stephen Hemminger [mailto:step...@networkplumber.org] Sent: Tuesday, 9 August 2022 17.26 On Tue, 9 Aug 2022 11:46:19 +0200 Morten Brørup wrote: I don't think memcpy() functions should have alignment requirements. That's not very practical, an

Re: [RFC v2] non-temporal memcpy

2022-08-10 Thread Mattias Rönnblom
On 2022-08-09 17:26, Stephen Hemminger wrote: On Tue, 9 Aug 2022 11:46:19 +0200 Morten Brørup wrote: I don't think memcpy() functions should have alignment requirements. That's not very practical, and violates the principle of least surprise. I didn't make the CPUs with these alignment requ

Re: [RFC v2] non-temporal memcpy

2022-08-10 Thread Mattias Rönnblom
On 2022-08-09 17:00, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Tuesday, 9 August 2022 14.05 On 2022-08-09 11:46, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Sunday, 7 August 2022 22.25 On 2022-07-19 17:26, Morten Brøru

RE: [RFC v2] non-temporal memcpy

2022-08-09 Thread Morten Brørup
> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Tuesday, 9 August 2022 17.26 > > On Tue, 9 Aug 2022 11:46:19 +0200 > Morten Brørup wrote: > > > > > > > I don't think memcpy() functions should have alignment > requirements. > > > That's not very practical, and violates the p

Re: [RFC v2] non-temporal memcpy

2022-08-09 Thread Stephen Hemminger
On Tue, 9 Aug 2022 11:46:19 +0200 Morten Brørup wrote: > > > > I don't think memcpy() functions should have alignment requirements. > > That's not very practical, and violates the principle of least > > surprise. > > I didn't make the CPUs with these alignment requirements. > > However, I wi

RE: [RFC v2] non-temporal memcpy

2022-08-09 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Tuesday, 9 August 2022 14.05 > > On 2022-08-09 11:46, Morten Brørup wrote: > >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > >> Sent: Sunday, 7 August 2022 22.25 > >> > >> On 2022-07-19 17:26, Morten Brørup wrote: > >>> Thi

Re: [RFC v2] non-temporal memcpy

2022-08-09 Thread Mattias Rönnblom
On 2022-08-09 11:46, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Sunday, 7 August 2022 22.25 On 2022-07-19 17:26, Morten Brørup wrote: This RFC proposes a set of functions optimized for non-temporal memory copy. At this stage, I am asking for feedback on

Re: [RFC v2] non-temporal memcpy

2022-08-09 Thread Mattias Rönnblom
On 2022-08-09 11:34, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Sunday, 7 August 2022 22.20 On 2022-07-29 22:26, Morten Brørup wrote: +TO: @Honnappa, we need input from ARM From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] Sent: Friday, 29 J

Re: [RFC v2] non-temporal memcpy

2022-08-09 Thread Mattias Rönnblom
On 2022-08-09 11:24, Morten Brørup wrote: From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] Sent: Sunday, 7 August 2022 22.41 On 2022-07-29 18:05, Stephen Hemminger wrote: It makes sense in a few select places to use non-temporal copy. But it would add unnecessary complexity to DPDK if eve

RE: [RFC v2] non-temporal memcpy

2022-08-09 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Sunday, 7 August 2022 22.25 > > On 2022-07-19 17:26, Morten Brørup wrote: > > This RFC proposes a set of functions optimized for non-temporal > memory copy. > > > > At this stage, I am asking for feedback on the concept. > > > > Appli

RE: [RFC v2] non-temporal memcpy

2022-08-09 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Sunday, 7 August 2022 22.20 > > On 2022-07-29 22:26, Morten Brørup wrote: > > +TO: @Honnappa, we need input from ARM > > > >> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > >> Sent: Friday, 29 July 2022 21.49 > >>>

RE: [RFC v2] non-temporal memcpy

2022-08-09 Thread Morten Brørup
> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Sunday, 7 August 2022 22.41 > > On 2022-07-29 18:05, Stephen Hemminger wrote: > > > > It makes sense in a few select places to use non-temporal copy. > > But it would add unnecessary complexity to DPDK if every function in > DPDK that

Re: [RFC v2] non-temporal memcpy

2022-08-07 Thread Mattias Rönnblom
On 2022-07-29 18:05, Stephen Hemminger wrote: On Fri, 29 Jul 2022 12:13:52 + Konstantin Ananyev wrote: Sorry, missed that part. Another question - who will do 'sfence' after the copying? Would it be inside memcpy_nt (seems quite costly), or would it be another API function for that:

Re: [RFC v2] non-temporal memcpy

2022-08-07 Thread Mattias Rönnblom
On 2022-07-19 17:26, Morten Brørup wrote: This RFC proposes a set of functions optimized for non-temporal memory copy. At this stage, I am asking for feedback on the concept. Applications sometimes data to another memory location, which is only used much later. In this case, it is inefficient t

Re: [RFC v2] non-temporal memcpy

2022-08-07 Thread Mattias Rönnblom
On 2022-07-29 22:26, Morten Brørup wrote: +TO: @Honnappa, we need input from ARM From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] Sent: Friday, 29 July 2022 21.49 From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] Sent: Friday, 29 July 2022 14.14 Sorry, missed t

RE: [RFC v2] non-temporal memcpy

2022-07-30 Thread Morten Brørup
> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > Sent: Saturday, 30 July 2022 00.00 > > > > > > Actually, one question I have for such small data-transfer > > > > > (16B per packet) - do you still see some noticable perfomance > > > > > improvement for such scenario? > > > > > >

RE: [RFC v2] non-temporal memcpy

2022-07-29 Thread Morten Brørup
+TO: @Honnappa, we need input from ARM > From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > Sent: Friday, 29 July 2022 21.49 > > > > > From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > > > Sent: Friday, 29 July 2022 14.14 > > > > > > > > > Sorry, missed that part. >

RE: [RFC v2] non-temporal memcpy

2022-07-29 Thread Morten Brørup
> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > Sent: Friday, 29 July 2022 14.14 > > > Sorry, missed that part. > > > > > > Another question - who will do 'sfence' after the copying? > > > Would it be inside memcpy_nt (seems quite costly), or would > > > it be another API fun

RE: [RFC v2] non-temporal memcpy

2022-07-29 Thread Morten Brørup
> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Friday, 29 July 2022 18.06 > > On Fri, 29 Jul 2022 12:13:52 + > Konstantin Ananyev wrote: > > > Sorry, missed that part. > > > > > > > > > Another question - who will do 'sfence' after the copying? > > > > Would it be insi

RE: [RFC v2] non-temporal memcpy

2022-07-29 Thread Morten Brørup
> From: Konstantin Ananyev [mailto:konstantin.anan...@huawei.com] > Sent: Friday, 29 July 2022 13.50 > > > > From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] > > > Sent: Friday, 29 July 2022 12.00 > > > > > > 24/07/2022 23:18, Morten Brørup пишет: > > > >> From: Konstantin Ananyev

Re: [RFC v2] non-temporal memcpy

2022-07-29 Thread Stephen Hemminger
On Fri, 29 Jul 2022 12:13:52 + Konstantin Ananyev wrote: > Sorry, missed that part. > > > > > > Another question - who will do 'sfence' after the copying? > > > Would it be inside memcpy_nt (seems quite costly), or would > > > it be another API function for that: memcpy_nt_flush() or so?

RE: [RFC v2] non-temporal memcpy

2022-07-29 Thread Morten Brørup
> From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] > Sent: Friday, 29 July 2022 12.00 > > 24/07/2022 23:18, Morten Brørup пишет: > >> From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] > >> Sent: Sunday, 24 July 2022 15.35 > >> > >> 22/07/2022 11:44, Morten Brørup пиш

Re: [RFC v2] non-temporal memcpy

2022-07-29 Thread Konstantin Ananyev
24/07/2022 23:18, Morten Brørup пишет: From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] Sent: Sunday, 24 July 2022 15.35 22/07/2022 11:44, Morten Brørup пишет: From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] Sent: Friday, 22 July 2022 01.20 Hi Morten, This RF

Re: [RFC v2] non-temporal memcpy

2022-07-29 Thread Konstantin Ananyev
28/07/2022 11:51, Morten Brørup пишет: From: Stanisław Kardach [mailto:k...@semihalf.com] Sent: Thursday, 28 July 2022 00.02 On Wed, 27 Jul 2022, 21:53 Honnappa Nagarahalli, wrote: Yes, x86 needs 16B alignment for NT load/stores But that's supposed to be arch specific limitation, that we

RE: [RFC v2] non-temporal memcpy

2022-07-28 Thread Morten Brørup
From: Stanisław Kardach [mailto:k...@semihalf.com] Sent: Thursday, 28 July 2022 00.02 > On Wed, 27 Jul 2022, 21:53 Honnappa Nagarahalli, > wrote: > > > > > > > Yes, x86 needs 16B alignment for NT load/stores But that's > > > supposed > > > > > to be arch > > > > > > specific limitation, that we

RE: [RFC v2] non-temporal memcpy

2022-07-28 Thread Morten Brørup
> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Wednesday, 27 July 2022 21.12 [RFC v2] non-temporal memcpy > > On Wed, 27 Jul 2022 20:49:59 +0200 > Morten Brørup wrote: > > > I'm considering rte_memcpy_nt() a performance optimized var

Re: [RFC v2] non-temporal memcpy

2022-07-27 Thread Stanisław Kardach
On Wed, 27 Jul 2022, 21:53 Honnappa Nagarahalli, < honnappa.nagaraha...@arm.com> wrote: > > > > > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > > > Sent: Wednesday, 27 July 2022 19.38 > > > > > > > [...] > > > > > > > > > > > > Yes, x86 needs 16B alignment for NT load/store

RE: [RFC v2] non-temporal memcpy

2022-07-27 Thread Honnappa Nagarahalli
> > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > > Sent: Wednesday, 27 July 2022 19.38 > > > > [...] > > > > > > > > > Yes, x86 needs 16B alignment for NT load/stores But that's > > supposed > > > > to be arch > > > > > specific limitation, that we probably want to hide,

Re: [RFC v2] non-temporal memcpy

2022-07-27 Thread Stephen Hemminger
On Wed, 27 Jul 2022 20:49:59 +0200 Morten Brørup wrote: > I'm considering rte_memcpy_nt() a performance optimized variant of memcpy(), > where the performance gain is less cache pollution. Thus, silent fallback to > memcpy() should suffice. Have you looked at existing Glibc code? last time I

RE: [RFC v2] non-temporal memcpy

2022-07-27 Thread Morten Brørup
> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > Sent: Wednesday, 27 July 2022 19.38 > [...] > > > > > > Yes, x86 needs 16B alignment for NT load/stores But that's > supposed > > > to be arch > > > > specific limitation, that we probably want to hide, no? > > > > Correct. How

RE: [RFC v2] non-temporal memcpy

2022-07-27 Thread Honnappa Nagarahalli
> > > From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > > Sent: Monday, 25 July 2022 03.18 > > > > [...] > > > > Yes, x86 needs 16B alignment for NT load/stores But that's supposed > > to be arch > > > specific limitation, that we probably want to hide, no? > > Correct. Howev

RE: [RFC v2] non-temporal memcpy

2022-07-27 Thread Morten Brørup
> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > Sent: Monday, 25 July 2022 03.18 > [...] > > Yes, x86 needs 16B alignment for NT load/stores But that's supposed > to be arch > > specific limitation, that we probably want to hide, no? Correct. However, optional hints for opt

RE: [RFC v2] non-temporal memcpy

2022-07-24 Thread Honnappa Nagarahalli
> > 22/07/2022 11:44, Morten Brørup пишет: > >> From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] > >> Sent: Friday, 22 July 2022 01.20 > >> > >> Hi Morten, > >> > >>> This RFC proposes a set of functions optimized for non-temporal > >> memory copy. > >>> > >>> At this stage, I am

RE: [RFC v2] non-temporal memcpy

2022-07-24 Thread Morten Brørup
> From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] > Sent: Sunday, 24 July 2022 15.35 > > 22/07/2022 11:44, Morten Brørup пишет: > >> From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] > >> Sent: Friday, 22 July 2022 01.20 > >> > >> Hi Morten, > >> > >>> This RFC prop

Re: [RFC v2] non-temporal memcpy

2022-07-24 Thread Konstantin Ananyev
22/07/2022 11:44, Morten Brørup пишет: From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] Sent: Friday, 22 July 2022 01.20 Hi Morten, This RFC proposes a set of functions optimized for non-temporal memory copy. At this stage, I am asking for feedback on the concept. Applicati

RE: [RFC v2] non-temporal memcpy

2022-07-22 Thread Morten Brørup
> From: Konstantin Ananyev [mailto:konstantin.v.anan...@yandex.ru] > Sent: Friday, 22 July 2022 01.20 > > Hi Morten, > > > This RFC proposes a set of functions optimized for non-temporal > memory copy. > > > > At this stage, I am asking for feedback on the concept. > > > > Applications sometimes

Re: [RFC v2] non-temporal memcpy

2022-07-21 Thread Konstantin Ananyev
Hi Morten, This RFC proposes a set of functions optimized for non-temporal memory copy. At this stage, I am asking for feedback on the concept. Applications sometimes data to another memory location, which is only used much later. In this case, it is inefficient to pollute the data cache with

RE: [RFC v2] non-temporal memcpy

2022-07-19 Thread Morten Brørup
> From: Stanisław Kardach [mailto:k...@semihalf.com] > Sent: Tuesday, 19 July 2022 20.51 > > On Tue, Jul 19, 2022 at 8:41 PM Morten Brørup > wrote: > > > > > From: David Christensen [mailto:d...@linux.vnet.ibm.com] > > > Assume that fallback to the standard temporal memcpy is an > acceptable > >

Re: [RFC v2] non-temporal memcpy

2022-07-19 Thread Stanisław Kardach
On Tue, Jul 19, 2022 at 8:41 PM Morten Brørup wrote: > > > From: David Christensen [mailto:d...@linux.vnet.ibm.com] > > Assume that fallback to the standard temporal memcpy is an acceptable > > implementation when not supported by the architecture, yes? > > Yes, that is exactly what I envisioned.

RE: [RFC v2] non-temporal memcpy

2022-07-19 Thread Morten Brørup
> From: David Christensen [mailto:d...@linux.vnet.ibm.com] > Sent: Tuesday, 19 July 2022 20.01 > > On 7/19/22 8:26 AM, Morten Brørup wrote: > > This RFC proposes a set of functions optimized for non-temporal > memory copy. > > > > At this stage, I am asking for feedback on the concept. > > > > App

Re: [RFC v2] non-temporal memcpy

2022-07-19 Thread David Christensen
On 7/19/22 8:26 AM, Morten Brørup wrote: This RFC proposes a set of functions optimized for non-temporal memory copy. At this stage, I am asking for feedback on the concept. Applications sometimes data to another memory location, which is only used much later. In this case, it is inefficient

[RFC v2] non-temporal memcpy

2022-07-19 Thread Morten Brørup
This RFC proposes a set of functions optimized for non-temporal memory copy. At this stage, I am asking for feedback on the concept. Applications sometimes data to another memory location, which is only used much later. In this case, it is inefficient to pollute the data cache with the copied dat