From: Jon Maloy <jon.ma...@ericsson.com> Date: Fri, 19 Oct 2018 19:55:40 +0200
> We have seen the following race scenario: > 1) named_distribute() builds a "bulk" message, containing a PUBLISH > item for a certain publication. This is based on the contents of > the binding tables's 'cluster_scope' list. > 2) tipc_named_withdraw() removes the same publication from the list, > bulds a WITHDRAW message and distributes it to all cluster nodes. > 3) tipc_named_node_up(), which was calling named_distribute(), sends > out the bulk message built under 1) > 4) The WITHDRAW message arrives at the just detected node, finds > no corresponding publication, and is dropped. > 5) The PUBLISH item arrives at the same node, is added to its binding > table, and remains there forever. > > This arrival disordering was earlier taken care of by the backlog queue, > originally added for a different purpose, which was removed in the > commit referred to below, but we now need a different solution. > In this commit, we replace the rcu lock protecting the 'cluster_scope' > list with a regular RW lock which comprises even the sending of the > bulk message. This both guarantees both the list integrity and the > message sending order. We will later add a commit which cleans up > this code further. > > Note that this commit needs recently added commit d3092b2efca1 ("tipc: > fix unsafe rcu locking when accessing publication list") to apply > cleanly. > > Fixes: 37922ea4a310 ("tipc: permit overlapping service ranges in name table") > Reported-by: Tuong Lien Tong <tuong.t.l...@dektech.com.au> > Acked-by: Ying Xue <ying....@windriver.com> > Signed-off-by: Jon Maloy <jon.ma...@ericsson.com> Applied.