On 11/07/15(Sat) 15:16, Stefan Sperling wrote:
> On Sat, Jul 11, 2015 at 09:54:34AM +0200, Stefan Sperling wrote:
> > While run_task() iterates through the host command queue during a scan,
> > the interface may get reset via run_stop() and run_init() when switching
> > bands (2GHz -> 5GHz) in run_media_change().
> > 
> > The list state gets corrupted because run_init() resets the list counters
> > while run_task() is still iterating the list. run_task() will now attempt
> > to call garbage or NULL callback pointers.
> > 
> > run_newstate_cb SCAN -> INIT
> > run_newstate_cb SCAN -> SCAN

How is this possible?  Why it isn't INIT -> SCAN?

> > run_newstate_cb SCAN -> SCAN
> > run_task ring->next=7
> > run_task ring->cur=0
> > run_task ring->queued=-7   <-- negative nonesense
> > run_task cmd=0xffff800000651110
> > run_task cmd->cb=0x0
> > 
> > The kernel now explodes trying to call cmd->cb.
> > 
> > This can be reproduced by running 'ifconfig run0 mediaopt monitor up'.
> 
> My previous duff cured a symptom.
> I believe this new diff closes the actual race.

I'm really not found of either diffs

> State transitions can already be waiting to be scheduled while run_stop()
> is scheduling another transition trying to bring the device back to INIT.
> Each state transition happens asynchronously in a process context.
                                ^^^^^^^^^^^^^^

I guess it happens only with USB drivers and the net80211 stack is not
suppose to work this way, right?

> So if multiple transitions are scheduled they can happen out of order.

I can think of two cases when such thing can happen: when a state
transition schedule another one or when a timeout fires.  Do you
know what happens in the run(4) case?

> When injecting a state transition "from the side" to force the device
> into a particular state, like run_stop() does, we need to make sure the
> net80211 stack is not concurrently trying to transition the driver for
> other purposes, such as the scanning loop.

What are the contexts of "from the side" and "net80211 stack"?

> This issue probably affects a number of other wifi drivers as well.
> If this fix is good to go I'll do a tree-wide sweep soon.

I'd like to understand what you're fix is fixing 8)  Are we chasing a
problem of "ic_state" not being synchronized with the reality because
some code in interrupt context scheduled a state transition or are we
forcing multiple state transitions to not interleave an INIT change?

In any case, if we want to go MP, we should have a better understanding
of these state transitions :)

> Please disregard all the other hackish patches I mailed out this morning.
> 
> Index: if_run.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/usb/if_run.c,v
> retrieving revision 1.109
> diff -u -p -r1.109 if_run.c
> --- if_run.c  12 Jun 2015 15:47:31 -0000      1.109
> +++ if_run.c  11 Jul 2015 13:00:21 -0000
> @@ -1751,6 +1751,11 @@ run_do_async(struct run_softc *sc, void 
>               return;
>  
>       s = splusb();
> +     if (ring->queued == RUN_HOST_CMD_RING_COUNT) {
> +             splx(s);
> +             printf("%s: host cmd queue overrun\n", sc->sc_dev.dv_xname);
> +             return; /* XXX */
> +     }
>       cmd = &ring->cmd[ring->cur];
>       cmd->cb = cb;
>       KASSERT(len <= sizeof (cmd->data));
> @@ -4504,7 +4509,9 @@ run_init(struct ifnet *ifp)
>       }
>  
>       /* init host command ring */
> -     sc->cmdq.cur = sc->cmdq.next = sc->cmdq.queued = 0;
> +     if (sc->cmdq.queued != 0)
> +             panic("outstanding host commands queued");
> +     sc->cmdq.cur = sc->cmdq.next = 0;
>  
>       /* init Tx rings (4 EDCAs) */
>       for (qid = 0; qid < 4; qid++) {
> @@ -4739,9 +4746,17 @@ run_stop(struct ifnet *ifp, int disable)
>       timeout_del(&sc->calib_to);
>  
>       s = splusb();
> +
> +     /*
> +      * Wait for all queued asynchronous commands to complete
> +      * before switching to INIT state.
> +      */
> +     usb_wait_task(sc->sc_udev, &sc->sc_task);
> +
>       ieee80211_new_state(ic, IEEE80211_S_INIT, -1);
> -     /* wait for all queued asynchronous commands to complete */
> +     /* Wait for asynchronous state transition to complete. */
>       usb_wait_task(sc->sc_udev, &sc->sc_task);
> +
>       splx(s);
>  
>       /* Disable Tx/Rx DMA. */
> 

Reply via email to