Bug#1041524: logcheck: badly handles "rsyslog + journalctl" checking

Thomas Parmelan Fri, 21 Jul 2023 01:57:23 -0700

(Note: it seems you forgot to Cc me - I extracted your mail from the BTS
web interface to make this answer. However I now am subscribed to this
bug report so that should not matter anymore!)


Le jeudi 20 juillet 2023 à 21:43, d'après
Richard Lewis <richard.lewis.deb...@googlemail.com> :

> > The systemd journal is checked by default, in addition to rsyslog files,
> > starting with logcheck version 1.4.1. But the format of timestamps are
> > different by default for journal (uses the old-school format: "Jul 20
> > 11:51:03") and rsyslog >= 8.2210.0-3 (uses an RFC3339-compatible format:
> > "2023-07-20T11:51:03.046581+0200"). So logcheck cannot compare them
> > correctly.
> 
> I understand until the word "compare" here - are you saying that rules
> don't match entries from rsyslog or from systemd? because both seem to
> be matched fine for me!

Sorry, bad explanation on my part. What I started to see in logcheck
reports was: two times the same event, one with the rsyslog timestamp
and one with the journalctl timestamp. So I started to search why this
was happening and tried to make it so that each event would be reported
only once, like it was before the journal-checking was enabled. But that
does not seem to be really possible, more on that below.

> >   - add the missing ':' between the timezone hours and minutes to convert 
> > the
> >     journalctl shot-iso-precise timestamp to the exact RFC3339 timestamp
> >     format used by rsyslog
> 
> I think my question above is related to this bullet: a typical prefix
> in the logcheck-database of rules is
> 
> ^(\w{3} [ :0-9]{11}|[0-9T:.+-]{32})
> 
> which should match both rules without needing to change the format,
> and includes a : wherever it occurs.... have we got something wrong in
> this, and if so, what?

The point of this manipulation is to have only one format of timestamp
in the mail report, so that it can be easier (both for sort and for a
human) to read and compare.

Let's take a single sample event, getting logged both by rsylog and
journald.

With the default configuration and without my patch I get this in the
report, which is really not easy to read because of the huge difference
in timestamp formatting (and also, there's no way this can be sorted
correctly):

2023-07-20T21:10:16.791208+02:00 silk fetchmail[2839]: mailbox selection failed
Jul 20 21:10:16 silk fetchmail[2839]: mailbox selection failed

With my patch to use "journalctl -o short-iso-precise", I get this
(a little better):

2023-07-20T21:10:16.790956+0200 silk fetchmail[2839]: mailbox selection failed
2023-07-20T21:10:16.791208+02:00 silk fetchmail[2839]: mailbox selection failed

And when adding the missing ':' I get this:

2023-07-20T21:10:16.790956+02:00 silk fetchmail[2839]: mailbox selection failed
2023-07-20T21:10:16.791208+02:00 silk fetchmail[2839]: mailbox selection failed

I initially thought that it would also enable me, by using SORTUNIQ=1,
to only have one report instead of a double report for each event (one
from rsyslog and one from journalctl), but in fact I now think that is
not possible because as you can see above, both rsyslog and journal
appear to use their own timestamp values which are slightly different,
so even with the same formatting they are different and cannot be
treated by "sort -u".

I don't know what to do about that. This doesn't solve my inital problem
(duplicate events in the report), but at least they have a similar
timestamp format that 1) makes it really easier to "compare" them by eye
and 2) allows logcheck to sort them).

> >   - join continuous lines (lines starting with whitespace) from
> >     journalctl output, as rsyslog only logs them as one line
> 
> this part i understand (although i didnt check the sed logic closely)
> and agree it would be worth doing: is it definitely correct that a
> blank line is always a continuation line like this?

I think each line starts either with a timestamp, or with whitespace as
indentation. This is done in the print_multiline() function in
src/shared/logs-show.c in the systemd source, from what I understand.

One sample of it that I recall was an error event in a Perl program that
led it to log several lines of Perl code (in the same log message). Once
it had passed through logcheck's sort treatment it was obviously
unusable :)

> >   - handle SORTUNIQ correctly even when using ISO_TIMESTAMPS
> 
> i didnt understand this bit of the patch (copied below):- it seems to
> have a lot of repetition, and introduce a call to 'uniq', but only
> sometimes:
> 
> (i also wonder why anything is better than just 'sort -u' regardless
> of timestamp format): can you explain what the issue is that is being
> solved?

I think i tried at the time (this was a few months ago) to use
SORTUNIQ=1 (it it set to 0 by default), I don't recall exactly, but you
are right, this is not correct. The logic was initially to use 'sort -u'
or only 'sort' and I wrongly transformed it to 'sort -u' or 'sort |
uniq' which I agree is total nonsense. Sorry about that.

The sort key selection probably still needs to be adjusted though. When
using ISO_TIMESTAMPS, the whole timestamp is in the first field so '-k
1' should be used. If not using ISO_TIMESTAMPS, the original behaviour
of "-k 1,3" is used (I don't understand it though: that would be the
month and then the hh:mm::ss parts?).

Regards,
Tom

-- 
Thomas Parmelan

Bug#1041524: logcheck: badly handles "rsyslog + journalctl" checking

Reply via email to