Hi guys,
When I do suspend/resume stress test with stmmac driver, I encountered some
tricky issues. DWC EQOS version is 5.10, Linux kernel version is 5.10.
1. The first issue is net watchdog timeout.
stmmac_xmit() call stmmac_tx_timer_arm() at the end to modify a timer to do the
transmission cleanup work. Imagine such a situation, stmmac enters suspend
immediately after stmmac_xmit() modify tx timer,
stmmac_tx_clean() would not be invoked, this could affect BQL(I still don't
know the specific reason), since netdev_tx_completed_queue() have not been
involved, and then dql_avail(&dev_queue->dql) finally always return a negative
value.
__dev_xmit_skb() -> qdisc_run() -> __qdisc_run() -> qdisc_restart() ->
dequeue_skb():
if ((q->flags & TCQ_F_ONETXQUEUE) &&
netif_xmit_frozen_or_stopped(txq)) // __QUEUE_STATE_STACK_XOFF
bit is set
After checking this, net core will stop transmitting any more. As a result, net
watchdong would timeout. To fix this issue, we should call
netdev_tx_reset_queue() in stmmac_resume().
2. The second issue is Rx channel fatal bus error.
During suspend/resume test, Rx channel report fatal bus error at a high
possibility(and report many times), but there is no handler for this situation
in stmmac driver. Do you know what would cause Rx channel fatal error? And how
to handle it?
I did some work, but now still can't fix it.
Thanks a lot in advance. 😊
Best Regards,
Joakim Zhang