On Wed, 2020-05-20 at 16:50 +0300, Mathias Nyman wrote: > On 20.5.2020 14.26, Rik van Riel wrote: > > After a few more weeks of digging, I have come to the tentative > > conclusion that either the XHCI driver, or the USB sound driver, > > or both, fail to handle USB errors correctly. > > > > I have some questions at the bottom, after a (brief-ish) > > explanation > > of exactly what seems to go wrong. > > > > TL;DR: arecord from a misbehaving device can hang forever > > after a USB error, due to poll on /dev/snd/timer never returning. > > > > The details: under some mysterious circumstances, the PCM290x > > family sound chips can send more data than expected during an > > isochronous transfer, leading to a babble error. Those > > circumstances seem to in part depend on the USB host controller > > and/or the electrical environment, since the chips work just > > fine for most people. > > > > Receiving data past the end of the isochronous transfer window > > scheduled for a device results in the XHCI controller throwing > > a babble error, which moves the endpoint into halted state. > > > > This is followed by the host controller software sending a > > reset endpoint command, and moving the endpoint into stopped > > state, as specified on pages 164-165 of the XHCI specification. > > Note that isoch endpoints should generate buffer overrun instead of > babble detect error on TD babble conditions. > See xHCI spec 6.4.5 additional note 115.
Maybe it should, but I'm hitting this printk in handle_tx_event:
case COMP_BABBLE_DETECTED_ERROR:
xhci_dbg(xhci, "Babble error for slot %u ep %u on
endpoint\n",
slot_id, ep_index);
status = -EOVERFLOW;
break;
Tracing in the sound driver suggests that packet belongs to
the sound data device (not the sync device), because the
URB with -EOVERFLOW status for the isochronous packet in that
URB is the last one seen there.
> Or maybe a frame babble could halt an isoc endpoint, see xhci
> 4.10.2.4.1
> but then you should see a port error and port going to disabled
> state.
>
> Any logs of this?
>
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> < trigger the issue >
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace
dmesg:
https://drive.google.com/open?id=1S2Qc8lroqA5-RMukuLBLWFGx10vEjG-i
usbtrace:
https://drive.google.com/open?id=1cbLcOnAtQRW0Chgak6PNC0l4yJv__4uO
> > However, the USB sound driver seems to have no idea that this
> > error happened. The function retire_capture_urb looks at the
> > status of each isochronous frame, but seems to be under the
> > assumption that the sound device just keeps on running.
> >
> > The function snd_complete_urb seems to only detect that the
> > device is not running if usb_submit_urb returns a failure.
> >
> > err = usb_submit_urb(urb, GFP_ATOMIC);
> > if (err == 0)
> > return;
> >
> > usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n",
> > err);
> >
> > if (ep->data_subs && ep->data_subs->pcm_substream) {
> > substream = ep->data_subs->pcm_substream;
> > snd_pcm_stop_xrun(substream);
> > }
> >
> > However, the XHCI driver will happily submit an URB to a
> > stopped device. Looking at the call trace usb_submit_urb ->
> > xhci_urb_enqueue -> xhci_queue_isoc_tx_prepare -> prepare_ring,
> > you can see this code:
> >
> > /* Make sure the endpoint has been added to xHC schedule */
> > switch (ep_state) {
> > ...
> > case EP_STATE_HALTED:
> > xhci_dbg(xhci, "WARN halted endpoint, queueing URB
> > anyway.\n");
> > case EP_STATE_STOPPED:
> > case EP_STATE_RUNNING:
> > break;
> >
> > This leads me to a few questions:
> > - should retire_capture_urb call snd_pcm_stop_xrun,
> > or another function like it, if it sees certain
> > errors in the iso frame in the URB?
> > - should snd_complete_urb do something with these
> > errors, too, in case they happen on the sync frames
> > and not the data frames?
> > - does the XHCI code need to ring the doorbell when
> > submitting an URB to a stopped device, or is it
> > always up to the higher-level driver to fully reset
> > the device before it can do anything useful?
>
> xhci driver should ring the doorbell.
>
> xhci_queue_isoc_tx()
> giveback_first_trb()
> xhci_ring_ep_doorbell()
>
> when we are talking about babble or transaction errors the device
> might be
> completely unaware of the situation. Device side of the endpoint is
> not necessarily halted.
> xHC host will only halt its internal endpoint state, and it needs a
> reset
> endopoint command from the xhci driver to clear the internal halt
> state.
Interesting. I had no idea the endpoint on each side
was controlled separately.
--
All Rights Reversed.
signature.asc
Description: This is a digitally signed message part

