Hi guys,

When I do suspend/resume stress test with stmmac driver, I encountered some 
tricky issues. DWC EQOS version is 5.10, Linux kernel version is 5.10.

1. The first issue is net watchdog timeout.
stmmac_xmit() call stmmac_tx_timer_arm() at the end to modify a timer to do the 
transmission cleanup work. Imagine such a situation, stmmac enters suspend 
immediately after stmmac_xmit() modify tx timer,
stmmac_tx_clean() would not be invoked, this could affect BQL(I still don't 
know the specific reason), since netdev_tx_completed_queue() have not been 
involved, and then dql_avail(&dev_queue->dql) finally always return a negative 
value.
        __dev_xmit_skb() -> qdisc_run() -> __qdisc_run() -> qdisc_restart() -> 
dequeue_skb():
         if ((q->flags & TCQ_F_ONETXQUEUE) &&
             netif_xmit_frozen_or_stopped(txq))  // __QUEUE_STATE_STACK_XOFF 
bit is set
After checking this, net core will stop transmitting any more. As a result, net 
watchdong would timeout. To fix this issue, we should call 
netdev_tx_reset_queue() in stmmac_resume().

2. The second issue is Rx channel fatal bus error.
During suspend/resume test, Rx channel report fatal bus error at a high 
possibility(and report many times), but there is no handler for this situation 
in stmmac driver. Do you know what would cause Rx channel fatal error? And how 
to handle it?
I did some work, but now still can't fix it.

Thanks a lot in advance. 😊

Best Regards,
Joakim Zhang

Reply via email to