On Tue, 27 Nov 2007 14:34:44 -0800 "Kok, Auke" <[EMAIL PROTECTED]> wrote:
> Stephen Hemminger wrote: > > On Tue, 27 Nov 2007 19:52:24 +0100 > > Robert Olsson <[EMAIL PROTECTED]> wrote: > > > >> Hello! > >> > >> I've discovered a bug while testing the new multiQ NAPI code. In hi-load > >> situations when we take down an interface we get a kernel panic. The > >> oops is below. > >> > >> From what I see this happens when driver does napi_disable() and clears > >> NAPI_STATE_SCHED. In net_rx_action there is a check for work == weight > >> a sort indirect test but that's now not enough to cover the load > >> situation. > >> where we have NAPI_STATE_SCHED cleared by e1000_down in my case and still > >> full quota. Latest git but I'll guess the is the same in all later kernels. > >> There might be different solutions... one variant is below: > > > > It is considered a driver bug in 2.6.24 to call netif_rx_complete (clear > > NAPI_STATE_SCHED) > > and do a full quota. That bug already had to be fixed in other drivers, > > look like e1000 has same problem. > > Stephen, > > please enlighten me, can you e.g. show me a commit of other drivers where you > fixed this up? > > Thanks, > > Auke Author: David S. Miller <[EMAIL PROTECTED]> 2007-10-11 18:08:29 Committer: David S. Miller <[EMAIL PROTECTED]> 2007-10-11 18:08:29 Parent: b9f2c0440d806e01968c3ed4def930a43be248ad ([netdrvr] Stop using legacy hooks ->self_test_count, ->get_stats_count) Child: 266918303226cceac7eca38ced30f15f277bd89c ([SKY2]: status polling loop (post merge)) Branches: master, origin Follows: v2.6.23 Precedes: v2.6.24-rc1 [NET]: Fix NAPI completion handling in some drivers. In order for the list handling in net_rx_action() to be correct, drivers must follow certain rules as stated by this comment in net_rx_action(): /* Drivers must not modify the NAPI state if they * consume the entire weight. In such cases this code * still "owns" the NAPI instance and therefore can * move the instance around on the list at-will. */ A few drivers do not do this because they mix the budget checks with reading hardware state, resulting in crashes like the one reported by [EMAIL PROTECTED] BNX2 and TG3 are taken care of here, SKY2 fix is from Stephen Hemminger. Signed-off-by: David S. Miller <[EMAIL PROTECTED]> -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html