Juha, I have finally been able to review the material that came with this bug report. Thanks for all the good info, but it looks everything was related to the $AllowedSender bug, not to the race condition (which I, too, think exists).
... more inline below... On Sat, 2009-01-10 at 20:12 +0200, Juha Koho wrote: > For this issue (number 2.) I believe it could be a thread > synchronization issue. The client that has had these problems is a > quad core system and I installed other single core system with exactly > the same configuration not running the recompiled version and it has > been working perfectly since I installed it for at least a week ago. Definitely. I am trying to track down a nasty race condition (I think it is one) for a while now. It seems to occur only on machines with at least four cores and not always. I unfortunately can not reproduced it myself. This partly due to insufficient hardware, but when I got a machine for a while, I was able to see the issue only once or twice, but very, very random and I could not draw any conclusion before I needed to return the machine. There are few other reports, but for none of them I have been able to obtain any information that points to the culprit. I hope we can make better success in your case. > I > don't think these issues are related either because my client used to > crash at random times and not during reload. Right, this one is different. > > By the way. I'm actually using TCP to forward messages and I haven't > tried UDP yet. This doesn't seem to make a difference. I think I have tracked down it to either the code that creates or destructs the message object, but not being able to reliably reproduce, this is just an educated guess. So the input may make a difference (but I don't think so). The primary question I have at this time is if you can reproduce the bug without the $AllowedSender directive (or with the patch I created for the cloned bug). If so, that would be a very good thing. From there, we would need to change the config to see if it disappears if some settings are changed (I am a bit sceptic about the async queue). That than could lead us to the right path, even when not being able to apply any debug settings. Oh - did I mention that the bug almost instantly disappears if rsyslog is compiled for debugging. I initially thought that is an artifact of limited concurrency due to debug calls, but now I tend to believe that it actually is due to reduced speed - so on a 8-core system we may have the issue even with debug mode (someone with a 8 way system out there? ;)). I guess the bug is quite basic, but it is very hard to find it not being able to reproduce it at will or at least once a day and in debug mode... Feedback appreciated, Rainer -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org