On Fri, Dec 18, 2020 at 4:30 PM Jakub Kicinski <k...@kernel.org> wrote: > > On Thu, 17 Dec 2020 17:25:18 +0100 Antoine Tenart wrote: > > Callers to netif_set_xps_queue should take the rtnl lock. Failing to do > > so can lead to race conditions between netdev_set_num_tc and > > netif_set_xps_queue, triggering various oops: > > > > - netif_set_xps_queue uses dev->tc_num as one of the parameters to > > compute the size of new_dev_maps when allocating it. dev->tc_num is > > also used to access the map, and the compiler may generate code to > > retrieve this field multiple times in the function. > > > > - netdev_set_num_tc sets dev->tc_num. > > > > If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is > > set to a higher value through netdev_set_num_tc, later accesses to > > new_dev_maps in netif_set_xps_queue could lead to accessing memory > > outside of new_dev_maps; triggering an oops. > > > > One way of triggering this is to set an iface up (for which the driver > > uses netdev_set_num_tc in the open path, such as bnx2x) and writing to > > xps_cpus in a concurrent thread. With the right timing an oops is > > triggered. > > > > Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic > > classes") > > Let's CC Alex > > > Signed-off-by: Antoine Tenart <aten...@kernel.org> > > Two things: (a) is the datapath not exposed to a similar problem? > __get_xps_queue_idx() uses dev->tc_num in a very similar fashion.
I think we are shielded from this by the fact that if you change the number of tc the Tx path has to be torn down and rebuilt since you are normally changing the qdisc configuration anyway. > Should we perhaps make the "num_tcs" part of the XPS maps which is > under RCU protection rather than accessing the netdev copy? So it looks like the issue is the fact that we really need to synchronize netdev_reset_tc, netdev_set_tc_queue, and netdev_set_num_tc with __netif_set_xps_queue. > (b) if we always take rtnl_lock, why have xps_map_mutex? Can we > rearrange things so that xps_map_mutex is sufficient? It seems like the quick and dirty way would be to look at updating the 3 functions I called out so that they were holding the xps_map_mutex while they were updating things, and for __netif_set_xps_queue to expand out the mutex to include the code starting at "if (dev->num_tc) {".