>
> Hi Jerin,
>
> Following the guide to use the PMU counters(KO inserted and DPDK
> recompiled), the numbers increased 10+ folds(bigger numbers here mean
> more precise?), is this valid and expected?
This is correct, big numbers mean, more precise/granular results.
> No significant difference w
Hi Jerin,
Following the guide to use the PMU counters(KO inserted and DPDK recompiled),
the numbers increased 10+ folds(bigger numbers here mean more precise?), is
this valid and expected?
No significant difference was seen.
gavin@net-arm-thunderx2:~/community/dpdk$ sudo ./test/test/test -l
dahl
> > CC: "dev@dpdk.org" , Honnappa Nagarahalli
> > , "Ananyev, Konstantin"
> > , "Gavin Hu (Arm Technology China)"
> > , Steve Capper , nd
> ,
> > "sta...@dpdk.org"
> > Subject: Re: [dpdk-de
> > > On 07/10/2018, 06:03, "Jerin Jacob"
> wrote:
> > >
> > > In arm64 case, it will have ATOMIC_RELAXED followed by asm
> > volatile
> ("":::"memory") of rte_pause().
> > > I would n't have any issue, if the generated code code is same or
> better than the exiting case.
On 08/10/2018, 16:46, "Ola Liljedahl" wrote:
On 08/10/2018, 16:44, "Bruce Richardson" wrote:
On Mon, Oct 08, 2018 at 09:22:05AM +, Ola Liljedahl wrote:
> "* multi-producer safe lock-free ring buffer enqueue"
> The comment is also wrong.
On 08/10/2018, 16:44, "Bruce Richardson" wrote:
On Mon, Oct 08, 2018 at 09:22:05AM +, Ola Liljedahl wrote:
> "* multi-producer safe lock-free ring buffer enqueue"
> The comment is also wrong. This design is not lock-free, how could it be
when there is spinning
> (wai
On Mon, Oct 08, 2018 at 09:22:05AM +, Ola Liljedahl wrote:
> "* multi-producer safe lock-free ring buffer enqueue"
> The comment is also wrong. This design is not lock-free, how could it be when
> there is spinning
> (waiting) for other threads in the code? If a thread must wait for other
>
uot;
> , "Gavin Hu (Arm Technology China)"
> , Steve Capper , nd
,
> "sta...@dpdk.org"
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load
> User-Agent: Mutt/1.10.1 (2018-07-13)
>
> External Email
>
; "sta...@dpdk.org"
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load
> User-Agent: Mutt/1.10.1 (2018-07-13)
>
> External Email
>
> -Original Message-
> > Date: Mon, 8 Oct 2018 11:59:16 +
> > From: Ola Liljedahl
-Original Message-
> Date: Mon, 8 Oct 2018 11:59:16 +
> From: Ola Liljedahl
> To: Jerin Jacob
> CC: "dev@dpdk.org" , Honnappa Nagarahalli
> , "Ananyev, Konstantin"
> , "Gavin Hu (Arm Technology China)"
> , Steve Capper , nd ,
> "sta...@dpdk.org"
> Subject: Re: [PATCH v3 1/3] ring
On 08/10/2018, 13:50, "Jerin Jacob" wrote:
I don't know how that creates more undefined behavior. So replied in the
context of your reply that, according to your view even Linux is running
with undefined behavior.
As I explained, Linux does not use C11 atomics (nor GCC __
-Original Message-
> Date: Mon, 8 Oct 2018 11:21:42 +
> From: Ola Liljedahl
> To: Jerin Jacob
> CC: "dev@dpdk.org" , Honnappa Nagarahalli
> , "Ananyev, Konstantin"
> , "Gavin Hu (Arm Technology China)"
> , Steve Capper , nd ,
> "sta...@dpdk.org"
> Subject: Re: [PATCH v3 1/3] ring
On 08/10/2018, 12:47, "Jerin Jacob" wrote:
-Original Message-
> Date: Mon, 8 Oct 2018 10:25:45 +
> From: Ola Liljedahl
> To: Jerin Jacob
> CC: "dev@dpdk.org" , Honnappa Nagarahalli
> , "Ananyev, Konstantin"
> , "Gavin Hu (Arm Technology China)"
>
-Original Message-
> Date: Mon, 8 Oct 2018 10:33:43 +
> From: "Gavin Hu (Arm Technology China)"
> To: Ola Liljedahl , Jerin Jacob
>
> CC: "dev@dpdk.org" , Honnappa Nagarahalli
> , "Ananyev, Konstantin"
> , Steve Capper , nd
> , "sta...@dpdk.org"
> Subject: RE: [PATCH v3 1/3] ring
-Original Message-
> Date: Mon, 8 Oct 2018 10:25:45 +
> From: Ola Liljedahl
> To: Jerin Jacob
> CC: "dev@dpdk.org" , Honnappa Nagarahalli
> , "Ananyev, Konstantin"
> , "Gavin Hu (Arm Technology China)"
> , Steve Capper , nd ,
> "sta...@dpdk.org"
> Subject: Re: [PATCH v3 1/3] ring
This is ThunderX2.
> -Original Message-
> From: Ola Liljedahl
> Sent: Monday, October 8, 2018 6:39 PM
> To: Gavin Hu (Arm Technology China) ; Jerin Jacob
>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> ; Ananyev, Konstantin
> ; Steve Capper ;
> nd ; sta...@dpdk.org
> Subject: Re: [PATCH v3 1
On 08/10/2018, 12:33, "Gavin Hu (Arm Technology China)"
wrote:
I did benchmarking w/o and w/ the patch, it did not show any noticeable
differences in terms of latency.
Which platform is this?
Here is the full log( 3 runs w/o the patch and 2 runs w/ the patch).
sudo ./test/tes
I did benchmarking w/o and w/ the patch, it did not show any noticeable
differences in terms of latency.
Here is the full log( 3 runs w/o the patch and 2 runs w/ the patch).
sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=1024 -- -i
RTE>>ring_perf_autotest (#1 run of test w
On 08/10/2018, 12:00, "Jerin Jacob" wrote:
-Original Message-
> Date: Mon, 8 Oct 2018 09:22:05 +
> From: Ola Liljedahl
> To: Jerin Jacob
> CC: "dev@dpdk.org" , Honnappa Nagarahalli
> , "Ananyev, Konstantin"
> , "Gavin Hu (Arm Technology China)"
>
Or maybe performance gets worse but not because of that one additional
instruction/cycle in ring buffer enqueue and dequeue but because function or
loop alignment changed for one or more functions.
When the benchmarking noise (possibly several % due to changes in code
alignment) is bigger than
-Original Message-
> Date: Mon, 8 Oct 2018 09:22:05 +
> From: Ola Liljedahl
> To: Jerin Jacob
> CC: "dev@dpdk.org" , Honnappa Nagarahalli
> , "Ananyev, Konstantin"
> , "Gavin Hu (Arm Technology China)"
> , Steve Capper , nd ,
> "sta...@dpdk.org"
> Subject: Re: [PATCH v3 1/3] ring
On 08/10/2018, 08:06, "Jerin Jacob" wrote:
-Original Message-
> Date: Sun, 7 Oct 2018 20:44:54 +
> From: Ola Liljedahl
> To: Jerin Jacob
> CC: "dev@dpdk.org" , Honnappa Nagarahalli
> , "Ananyev, Konstantin"
> , "Gavin Hu (Arm Technology China)"
> ,
-Original Message-
> Date: Sun, 7 Oct 2018 20:44:54 +
> From: Ola Liljedahl
> To: Jerin Jacob
> CC: "dev@dpdk.org" , Honnappa Nagarahalli
> , "Ananyev, Konstantin"
> , "Gavin Hu (Arm Technology China)"
> , Steve Capper , nd ,
> "sta...@dpdk.org"
> Subject: Re: [PATCH v3 1/3] ring
> >
> > I doubt it is possible to benchmark with such a precision so to see the
> > potential difference of one ADD instruction.
> > Just changes in function alignment can affect performance by percents.
> And
> > the natural variation when not using a 100% deterministic system
On 07/10/2018, 06:03, "Jerin Jacob" wrote:
In arm64 case, it will have ATOMIC_RELAXED followed by asm volatile
("":::"memory") of rte_pause().
I would n't have any issue, if the generated code code is same or better
than the exiting case. but it not the case, Right?
The existing case
On 07/10/2018, 06:03, "Jerin Jacob" wrote:
How about fixing rte_pause() then?
Meaning issuing power saving instructions on missing archs.
Rte_pause() implemented as NOP or YIELD on ARM will likely not save any power.
You should use WFE for that.
I use this portable pattern:
//W
-Original Message-
> Date: Sat, 6 Oct 2018 19:44:35 +
> From: Ola Liljedahl
> To: Jerin Jacob , "dev@dpdk.org"
>
> CC: Honnappa Nagarahalli , "Ananyev,
> Konstantin" , "Gavin Hu (Arm Technology
> China)" , Steve Capper , nd
> , "sta...@dpdk.org"
> Subject: Re: [PATCH v3 1/3] ring
Some blogs posts about undefined behaviour in C/C++:
https://blog.regehr.org/archives/213
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
-- Ola
On 06/10/2018, 09:42, "Jerin Jacob" wrote:
-Original Message-
> Date: Fri, 5 Oct 2018 20:34:15 +
> From: Ola Liljedahl
> To: Honnappa Nagarahalli , Jerin Jacob
>
> CC: "Ananyev, Konstantin" , "Gavin Hu (Arm
> Technology China)" , "dev@dpdk.org" ,
-Original Message-
> Date: Fri, 5 Oct 2018 20:34:15 +
> From: Ola Liljedahl
> To: Honnappa Nagarahalli , Jerin Jacob
>
> CC: "Ananyev, Konstantin" , "Gavin Hu (Arm
> Technology China)" , "dev@dpdk.org" ,
> Steve Capper , nd , "sta...@dpdk.org"
>
> Subject: Re: [PATCH v3 1/3] ring
On 05/10/2018, 22:29, "Honnappa Nagarahalli"
wrote:
>
> I doubt it is possible to benchmark with such a precision so to see the
> potential difference of one ADD instruction.
> Just changes in function alignment can affect performance by percents. And
> the natural variat
>
> I doubt it is possible to benchmark with such a precision so to see the
> potential difference of one ADD instruction.
> Just changes in function alignment can affect performance by percents. And
> the natural variation when not using a 100% deterministic system is going to
> be a lot larger t
I doubt it is possible to benchmark with such a precision so to see the
potential difference of one ADD instruction.
Just changes in function alignment can affect performance by percents. And the
natural variation when not using a 100% deterministic system is going to be a
lot larger than one cy
Hi Jerin,
Thank you for generating the disassembly, that is really helpful. I
agree with you that we have the option of moving parts 2 and 3 forward. I will
let Gavin take a decision.
I suggest that we run benchmarks on this patch alone and in combination with
other patches in the serie
So you don't want to write the proper C11 code because the compiler generates
one extra instruction that way?
You don't even know if that one extra instruction has any measurable impact on
performance. E.g. it could be issued the cycle before together with other
instructions.
We can complain to
-Original Message-
> Date: Fri, 5 Oct 2018 15:11:44 +
> From: Honnappa Nagarahalli
> To: "Ananyev, Konstantin" , Ola Liljedahl
> , "Gavin Hu (Arm Technology China)"
> , Jerin Jacob
> CC: "dev@dpdk.org" , Steve Capper , nd
> , "sta...@dpdk.org"
> Subject: RE: [PATCH v3 1/3] ring: r
> > >
> > > Hi Jerin,
> > >
> > > Thanks for your review, inline comments from our internal
> discussions.
> > >
> > > BR. Gavin
> > >
> > > > -Original Message-
> > > > From: Jerin Jacob
> > > > Sent: Saturday
On 05/10/2018, 15:45, "Ananyev, Konstantin"
wrote:
We all know that 32bit load/store on cpu we support - are atomic.
Well, not necessarily true for unaligned loads and stores. But the "problem"
here is that we are not directly generating 32-bit load and store instructions
(that would re
wrote:
>
>
>
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Gavin Hu (Arm
> Technology China)
> > Sent: Friday, October 5, 2018 1:47 AM
> > To: Jerin Jacob
> > Cc: dev
> Cc: dev@dpdk.org; Honnappa Nagarahalli
; Steve Capper ; Ola
Liljedahl
> ; nd ; sta...@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic
load
>
> Hi Jerin,
>
> Thanks for your review, inli
c: dev@dpdk.org; Honnappa Nagarahalli ;
Steve Capper ; Ola Liljedahl
> ; nd ; sta...@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load
>
> Hi Jerin,
>
> Thanks for your review, inline comments from our i
.@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load
>
> Hi Jerin,
>
> Thanks for your review, inline comments from our internal discussions.
>
> BR. Gavin
>
> > -Original Message-
> > From: Jerin Jacob
> > Sent: Sat
Hi Jerin,
Thanks for your review, inline comments from our internal discussions.
BR. Gavin
> -Original Message-
> From: Jerin Jacob
> Sent: Saturday, September 29, 2018 6:49 PM
> To: Gavin Hu (Arm Technology China)
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> ; Steve Capper
> ; Ola Lilje
-Original Message-
> Date: Mon, 17 Sep 2018 16:17:22 +0800
> From: Gavin Hu
> To: dev@dpdk.org
> CC: gavin...@arm.com, honnappa.nagaraha...@arm.com, steve.cap...@arm.com,
> ola.liljed...@arm.com, jerin.ja...@caviumnetworks.com, n...@arm.com,
> sta...@dpdk.org
> Subject: [PATCH v3 1/3] ri
> -Original Message-
> From: Gavin Hu (Arm Technology China)
> Sent: 2018年9月26日 17:29
> To: Gavin Hu (Arm Technology China) ; dev@dpdk.org
> Cc: Honnappa Nagarahalli ; Steve Capper
> ; Ola Liljedahl ;
> jerin.ja...@caviumnetworks.com; nd ; sta...@dpdk.org; Justin
> He
> Subject: RE: [PA
+Justin He
> -Original Message-
> From: Gavin Hu
> Sent: Monday, September 17, 2018 4:17 PM
> To: dev@dpdk.org
> Cc: Gavin Hu (Arm Technology China) ; Honnappa
> Nagarahalli ; Steve Capper
> ; Ola Liljedahl ;
> jerin.ja...@caviumnetworks.com; nd ; sta...@dpdk.org
> Subject: [PATCH v3 1/3]
In update_tail, read ht->tail using __atomic_load.Although the
compiler currently seems to be doing the right thing even without
_atomic_load, we don't want to give the compiler freedom to optimise
what should be an atomic load, it should not be arbitarily moved
around.
Fixes: 39368ebfc6 ("ring: i
47 matches
Mail list logo