First off, thanks for all your help. Second off,
On 11/16/06, Larry Finger <[EMAIL PROTECTED]> wrote:
Ray Lee wrote: > > If I could figure out a way to make it repeatable, I'd happily do a blind > bisect.
[...]
> I'm open to suggestions on how to make the problem trigger more than once > every two days... I don't know what might be causing the lock problems. I'm more concerned with the NETDEV WATCHDOG timeouts. AFAIK, you are the only one still reporting this error. On my system, I get an occasional MAC suspend failure, sometimes followed by an BCM43xx_IRQ_XMIT_ERROR.
Last time I had trouble with 2.6.18-rcX, I wasn't the only one, just the only one reporting it. Can you tell me why reverting the likely culprit isn't an option? rc6 is out, and Linus is really pushing to finalize 2.6.19 here soon.
From what I read in your post, the timeouts happen a lot more often than once every two days. Once we get those fixed, then we can concentrate on the locking.
It's becoming clear that I wasn't so clear :-). No, it doesn't happen more than once every two (three, now) days. I'm saying that it's only happened twice, as once the first timeout message starts, the timeouts don't stop short of a reboot. Or, in other words, it happened occasionally under 2.6.19-rc3, but fixed itself. Under 2.6.19-rc5, it's happened less frequently (maybe), but once it starts, it goes on solid until I reboot the computer. Until I reboot, the laptop is fully unusable as things start hanging on the rtnl_lock (X, apparently). Please see http://madrabbit.org/~ray/messages.gz for the /var/log/messages to understand what I mean by that. (Though, that was captured before I'd rebuilt the module with debugging, unfortunately. Regardless, it may help clarify what I mean here.) So all the NETDEV WATCHDOG timeouts other than the first (of each of the two events) appear to be bogus, or side effects of rtnl_lock being held after the first time, and not clearing out. <thinks...> Maybe I've got the culprit backward here. Perhaps something else in my system is locking on rtnl_lock, and bcm43xx can't acquire it? Could the NETDEV WATCHDOG timeouts be a side effect of someone acquiring and not releasing the rtnl_lock()? Is that possible? (ie, would it cause the effect I'm seeing?) Ray - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html