On 06/27/2018 05:11 AM, Tariq Toukan wrote:
> 
> 
> On 09/02/2017 7:10 PM, Eric Dumazet wrote:
>> From: Eric Dumazet <eduma...@google.com>
>>
>> Using a reader-writer lock in fast path is silly, when we can
>> instead use RCU or a seqlock.
>>
>> For mlx4 hwstamp clock, a seqlock is the way to go, removing
>> two atomic operations and false sharing.
>>
>> Signed-off-by: Eric Dumazet <eduma...@google.com>
>> Cc: Tariq Toukan <tar...@mellanox.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx4/en_clock.c |   35 ++++++++--------
>>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h  |    2
>>   2 files changed, 19 insertions(+), 18 deletions(-)
>>
> 
> Hi Eric,
> 
> When my peer, Shay, modified mlx5 to adopt this same locking scheme/type, he 
> noticed a degradation in packet rate.
> He got back to testing mlx4 and also noticed a degradation introduced by this 
> patch.
> 
> Perf numbers (single ring):
> 
> mlx4:
> with rw-lock: ~8.54M pps
> with seq-lock: ~8.51M pps
> 
> mlx5:
> With rw-lock: ~14.94M pps
> With seq-lock: ~14.48M pps
> 
> Actually, this can be explained by the analysis below.
> In short, number of readers is significantly larger than of writers. Hence 
> optimizing the readers flow would give better numbers. The issue is, the 
> read/write lock might cause writers starvation. Maybe RCU fits best here?
> 
> Degradation analysis:
> The patch changes the lock type which protects reads and updates of a 
> variable ( (struct mlx4_en_dev).clock variable)
> This variable is used to convert the hw timestamp into skb->hwtstamps.
> This variable is read for each transmitted/received packet and updated only 
> via ptp module and some overflow periodic work we have (maximum of 10 times 
> per second)
> Meaning that there are much more readers than writers, and it’s best to 
> optimize the readers flow.
>

Hi Tariq

Are you sure you enabled time stamps in your tests ?

mlx4_en_fill_hwtstamps() is _really_ called 8,540,000 times per second,
meaning a same amount of read_lock_irqsave()/read_unlock_irqrestore() is 
performed ?

You have a pretty damn good CPU it seems.

seqlock has no cost for a reader [1], other than reading one integer value and 
testing it.
[1] If this value never change (and is on a clean cache line).

Really this looks like ring->hwtstamp_rx_filter != HWTSTAMP_FILTER_ALL in your 
tests.

The numbers you gave just give one cycle difference per packet (half a nano 
second),
so I really doubt adding back the heavy  
read_lock_irqsave()/read_unlock_irqrestore()
could be faster.

Conceptually seqlock is some form of RCU, it really optimizes the readers flow.

Thanks

Reply via email to