Rainer Gerhards wrote: > Juha, > > I have finally been able to review the material that came with this bug > report. Thanks for all the good info, but it looks everything was > related to the $AllowedSender bug, not to the race condition (which I, > too, think exists). > > ... more inline below... > > On Sat, 2009-01-10 at 20:12 +0200, Juha Koho wrote: >> For this issue (number 2.) I believe it could be a thread >> synchronization issue. The client that has had these problems is a >> quad core system and I installed other single core system with exactly >> the same configuration not running the recompiled version and it has >> been working perfectly since I installed it for at least a week ago. > > Definitely. I am trying to track down a nasty race condition (I think it > is one) for a while now. It seems to occur only on machines with at > least four cores and not always. I unfortunately can not reproduced it > myself. This partly due to insufficient hardware, but when I got a > machine for a while, I was able to see the issue only once or twice, but > very, very random and I could not draw any conclusion before I needed to > return the machine. There are few other reports, but for none of them I > have been able to obtain any information that points to the culprit. I > hope we can make better success in your case. > >> I >> don't think these issues are related either because my client used to >> crash at random times and not during reload. > > Right, this one is different. > >> By the way. I'm actually using TCP to forward messages and I haven't >> tried UDP yet. > > This doesn't seem to make a difference. I think I have tracked down it > to either the code that creates or destructs the message object, but not > being able to reliably reproduce, this is just an educated guess. So the > input may make a difference (but I don't think so). > > The primary question I have at this time is if you can reproduce the bug > without the $AllowedSender directive (or with the patch I created for > the cloned bug). If so, that would be a very good thing. From there, we
I don't think the $AllowedSender directive has any influence on the crashes Juho experiences on his rsyslog clients (as he only used those directive on the rsyslog server). Why do you suspect that the $AllowedSender fix might have an influence on this? > would need to change the config to see if it disappears if some settings > are changed (I am a bit sceptic about the async queue). That than could > lead us to the right path, even when not being able to apply any debug > settings. Oh - did I mention that the bug almost instantly disappears if > rsyslog is compiled for debugging. I initially thought that is an > artifact of limited concurrency due to debug calls, but now I tend to > believe that it actually is due to reduced speed - so on a 8-core system > we may have the issue even with debug mode (someone with a 8 way system > out there? ;)). > > I guess the bug is quite basic, but it is very hard to find it not being > able to reproduce it at will or at least once a day and in debug mode... So, we have the $AllowedSender issue on the server (tracked as #511562), where Rainer has provided a patch and the fix seems to work for me (waiting for Juho to confirm this). Then we have the random crashes on the client (tracked as #509292). Juho's rsyslog.conf is at [1]. The only clue so far is, that it is related to multi core machines (>= 4 cores). I'm not convinced that it is related to remote logging. Juho, could you strip down your rsyslog.conf step by step (i.e. first remove the remote logging, then the $ActionQueue* directives, then the imklog plugin, then the different rules), which will help us to narrow down this bug. Cheers, Michael [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=509292#5 -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth?
signature.asc
Description: OpenPGP digital signature