On 08/31/2017 10:49 AM, Mason wrote: > On 31/08/2017 18:57, Florian Fainelli wrote: >> And the race is between phy_detach() setting phydev->attached_dev = NULL >> and phy_state_machine() running in PHY_HALTED state and calling >> netif_carrier_off(). > > I must be missing something. > (Since a thread cannot race against itself.) > > phy_disconnect calls phy_stop_machine which > 1) stops the work queue from running in a separate thread > 2) calls phy_state_machine *synchronously* > which runs the PHY_HALTED case with everything well-defined > end of phy_stop_machine > > phy_disconnect only then calls phy_detach() > which makes future calls of phy_state_machine perilous. > > This all happens in the same thread, so I'm not yet > seeing where the race happens?
The race is as described in David's earlier email, so let's recap: Thread 1 Thread 2 phy_disconnect() phy_stop_interrupts() phy_stop_machine() phy_state_machine() -> queue_delayed_work() phy_detach() phy_state_machine() -> netif_carrier_off() If phy_detach() finishes earlier than the workqueue had a chance to be scheduled and process PHY_HALTED again, then we trigger the NULL pointer de-reference. workqueues are not tasklets, the CPU scheduling them gets no guarantee they will run on the same CPU. -- Florian