Rainer Gerhards wrote:
> Juha,
> 
> I have finally been able to review the material that came with this bug
> report. Thanks for all the good info, but it looks everything was
> related to the $AllowedSender bug, not to the race condition (which I,
> too, think exists).
> 
> ... more inline below...
> 
> On Sat, 2009-01-10 at 20:12 +0200, Juha Koho wrote:
>> For this issue (number 2.) I believe it could be a thread
>> synchronization issue. The client that has had these problems is a
>> quad core system and I installed other single core system with exactly
>> the same configuration not running the recompiled version and it has
>> been working perfectly since I installed it for at least a week ago.
> 
> Definitely. I am trying to track down a nasty race condition (I think it
> is one) for a while now. It seems to occur only on machines with at
> least four cores and not always. I unfortunately can not reproduced it
> myself. This partly due to insufficient hardware, but when I got a
> machine for a while, I was able to see the issue only once or twice, but
> very, very random and I could not draw any conclusion before I needed to
> return the machine. There are few other reports, but for none of them I
> have been able to obtain any information that points to the culprit. I
> hope we can make better success in your case.
> 
>>  I
>> don't think these issues are related either because my client used to
>> crash at random times and not during reload.
> 
> Right, this one is different.
> 
>> By the way. I'm actually using TCP to forward messages and I haven't
>> tried UDP yet.
> 
> This doesn't seem to make a difference. I think I have tracked down it
> to either the code that creates or destructs the message object, but not
> being able to reliably reproduce, this is just an educated guess. So the
> input may make a difference (but I don't think so).
> 
> The primary question I have at this time is if you can reproduce the bug
> without the $AllowedSender directive (or with the patch I created for
> the cloned bug). If so, that would be a very good thing. From there, we

I don't think the $AllowedSender directive has any influence on the crashes Juho
experiences on his rsyslog clients (as he only used those directive on the
rsyslog server).
Why do you suspect that the $AllowedSender fix might have an influence on this?


> would need to change the config to see if it disappears if some settings
> are changed (I am a bit sceptic about the async queue). That than could
> lead us to the right path, even when not being able to apply any debug
> settings. Oh - did I mention that the bug almost instantly disappears if
> rsyslog is compiled for debugging. I initially thought that is an
> artifact of limited concurrency due to debug calls, but now I tend to
> believe that it actually is due to reduced speed - so on a 8-core system
> we may have the issue even with debug mode (someone with a 8 way system
> out there? ;)).
> 
> I guess the bug is quite basic, but it is very hard to find it not being
> able to reproduce it at will or at least once a day and in debug mode...

So, we have the $AllowedSender issue on the server (tracked as #511562), where
Rainer has provided a patch and the fix seems to work for me (waiting for Juho
to confirm this).


Then we have the random crashes on the client (tracked as #509292). Juho's
rsyslog.conf is at [1]. The only clue so far is, that it is related to multi
core machines (>= 4 cores). I'm not convinced that it is related to remote 
logging.
Juho, could you strip down your rsyslog.conf step by step (i.e. first remove the
remote logging, then the $ActionQueue* directives, then the imklog plugin, then
the different rules), which will help us to narrow down this bug.

Cheers,
Michael


[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=509292#5
-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to