Hi,
>> Root Cause
>> - Block layer timeout happens after power off UAS USB device which is
>> accessed as reproduce step. During timeout error handler process, scsi host
>> state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot
>> be released. And in final, usb subsystem hangs up.
>> Follow is function call:
>> blk_mq_timeout_work
>> …->scsi_times_out (… means some functions are not listed before this
>> function.)
>> …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY)
>> … -> scsi_error_handler
>> …-> uas_eh_device_reset_handler
>> -> usb_lock_device_for_reset <- take lock
>> -> usb_reset_device
>> …-> rebind = uas_post_reset (return 1 since ENODEV)
>> …-> usb_unbind_and_rebind_marked_interfaces (rebind=1)
>> …-> uas_disconnect (scsi_host_set_state to
>> SHOST_CANCEL_RECOVERY)
>> … -> scsi_queue_rq
>
>How does scsi_queue_rq get called here? As far as I can see, this shouldn't
>happen.
We confirmed the function call path on linux 4.9 when this problem occured
since we are working on it. In linux 4.9, the last function is scsi_request_fn
instead of scsi_queue_rq. In staging.git, we think the scsi_queue_rq is called
by follow path.
uas_disconnect
|- scsi_remove_host
|- scsi_forget_host
|- __scsi_remove_device
|- device_del
|- bus_remove_device
|- device_release_driver
|- device_release_driver_internal
|- __device_release_driver
|- drv->remove(dev) (sd_remove)
|- sd_shutdown
|- sd_sync_cache
|- scsi_execute
|- __scsi_execute
|- blk_execute_rq
|- blk_execute_rq_nowait
|- blk_mq_sched_insert_request
|- blk_mq_run_hw_queue
|- __blk_mq_delay_run_hw_queue
|- __blk_mq_run_hw_queue
|- blk_mq_sched_dispatch_requests
|- blk_mq_dispatch_rq_list
|- q->mq_ops->queue_rq (scsi_queue_rq)
>> Countermeasure
>> - Make uas_post_reset doesn’t return 1 when ENODEV returns from
>> uas_configure_endpoints since usb_unbind_and_rebind_marded_interfaces
>> doesn’t need to do unbind/rebind operations in this situation.
>> blk_mq_timeout_work
>> …->scsi_times_out (… means some functions are not listed before this
>> function.)
>> …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY)
>> … -> scsi_error_handler
>> …-> uas_eh_device_reset_handler (*1)
>> -> usb_lock_device_for_reset <- take lock
>> -> usb_reset_device
>> -> usb_reset_and_verify_device (return ENODEV and FAILED will
>> be reported to *1)
>> -> uas_post_reset returns 0 when ENODEV => rebind=0
>> -> usb_unbind_and_rebind_marked_interfaces (rebind=0)
>
>The difference is that uas_disconnect wasn't called here. But that routine
>should not cause any problems -- you're always supposed to be able to unbind a
>driver from a device. So it looks like this is not the right way to solve the
>problem.
We confirmed usb_driver_release_interface will call usb_unbind_interface when
this problem occurs.
So usb_unbind_interface will call driver disconnect callbak.
Regards,
Kento Kobayashi