From: Jon Maloy <jon.ma...@ericsson.com>
Date: Fri, 19 Oct 2018 19:55:40 +0200

> We have seen the following race scenario:
> 1) named_distribute() builds a "bulk" message, containing a PUBLISH
>    item for a certain publication. This is based on the contents of
>    the binding tables's 'cluster_scope' list.
> 2) tipc_named_withdraw() removes the same publication from the list,
>    bulds a WITHDRAW message and distributes it to all cluster nodes.
> 3) tipc_named_node_up(), which was calling named_distribute(), sends
>    out the bulk message built under 1)
> 4) The WITHDRAW message arrives at the just detected node, finds
>    no corresponding publication, and is dropped.
> 5) The PUBLISH item arrives at the same node, is added to its binding
>    table, and remains there forever.
> 
> This arrival disordering was earlier taken care of by the backlog queue,
> originally added for a different purpose, which was removed in the
> commit referred to below, but we now need a different solution.
> In this commit, we replace the rcu lock protecting the 'cluster_scope'
> list with a regular RW lock which comprises even the sending of the
> bulk message. This both guarantees both the list integrity and the
> message sending order. We will later add a commit which cleans up
> this code further.
> 
> Note that this commit needs recently added commit d3092b2efca1 ("tipc:
> fix unsafe rcu locking when accessing publication list") to apply
> cleanly.
> 
> Fixes: 37922ea4a310 ("tipc: permit overlapping service ranges in name table")
> Reported-by: Tuong Lien Tong <tuong.t.l...@dektech.com.au>
> Acked-by: Ying Xue <ying....@windriver.com>
> Signed-off-by: Jon Maloy <jon.ma...@ericsson.com>

Applied.

Reply via email to