Bug#1041524: logcheck: badly handles "rsyslog + journalctl" checking

Richard Lewis Fri, 21 Jul 2023 14:36:19 -0700

On Fri, 21 Jul 2023 at 09:57, Thomas Parmelan <tom+deb...@ankh.fr.eu.org> wrote:
>
> Le jeudi 20 juillet 2023 à 21:43, d'après
> Richard Lewis <richard.lewis.deb...@googlemail.com> :


> With the default configuration and without my patch I get this in the
> report, which is really not easy to read because of the huge difference
> in timestamp formatting (and also, there's no way this can be sorted
> correctly):
>
> 2023-07-20T21:10:16.791208+02:00 silk fetchmail[2839]: mailbox selection 
> failed
> Jul 20 21:10:16 silk fetchmail[2839]: mailbox selection failed
>
> With my patch to use "journalctl -o short-iso-precise", I get this
> (a little better):
>
> 2023-07-20T21:10:16.790956+0200 silk fetchmail[2839]: mailbox selection failed
> 2023-07-20T21:10:16.791208+02:00 silk fetchmail[2839]: mailbox selection 
> failed
>
> And when adding the missing ':' I get this:
>
> 2023-07-20T21:10:16.790956+02:00 silk fetchmail[2839]: mailbox selection 
> failed
> 2023-07-20T21:10:16.791208+02:00 silk fetchmail[2839]: mailbox selection 
> failed
>
> I initially thought that it would also enable me, by using SORTUNIQ=1,
> to only have one report instead of a double report for each event (one
> from rsyslog and one from journalctl), but in fact I now think that is
> not possible because as you can see above, both rsyslog and journal
> appear to use their own timestamp values which are slightly different,
> so even with the same formatting they are different and cannot be
> treated by "sort -u".

Thank-you, i understand now.

I suppose the timestamp difference is because the journal 'gets' the
message, and then forwards it to rsyslog which then creates its own
timestamp,
so even if systemd and rsyslog would agree a common format, this will
always be an issue.

Personally, i would just tell rsyslog to use the less precise format,
(or stop using rsyslog entirely).

Instead of changing journal lines to add the colon you could perhaps
try and delete all the decimals after the . , which would (usually?)
make them identical to the journal timestamps.

This might be overkill, but we could make the pre-processing of log
files be more customisable, so the user could choose whatever mangling
of timestamps, whitespace and/or sorting they want.

ie make it so that in logcheck.conf there are variables
PRE_PROCESS_LOG_ENTRIES="sed -e s/[[:space:]]$g/ -e ...."
POST_PROCESS_LOG_ENTRIES="sort -k1,3 -u"
# and/or set JOURNALCTL_OPTS=(-o iso-whatever)

and then logcheck could use eval,
eval $PRE_PROCESS_LOG_ENTRIES logoutput > logputput.1
eval $POST_PROCESS_LOG_ENTRIES logoutput > logputput.1

then people can do arbitrary mangling of timestamps or whatever. This
would avoid over-complicating the logic in logcheck at the expense of
requiring the user to write the sed/sort lines

> One sample of it that I recall was an error event in a Perl program that
> led it to log several lines of Perl code (in the same log message). Once
> it had passed through logcheck's sort treatment it was obviously
> unusable :)

i also get multiple lines from sbuild (although i just wrote local
rules to match each fragment)

> > >   - handle SORTUNIQ correctly even when using ISO_TIMESTAMPS
> >
> > i didnt understand this bit of the patch (copied below):- it seems to
> > have a lot of repetition, and introduce a call to 'uniq', but only
> > sometimes:
> >
> > (i also wonder why anything is better than just 'sort -u' regardless
> > of timestamp format): can you explain what the issue is that is being
> > solved?
>
> I think i tried at the time (this was a few months ago) to use
> SORTUNIQ=1 (it it set to 0 by default), I don't recall exactly, but you
> are right, this is not correct. The logic was initially to use 'sort -u'
> or only 'sort' and I wrongly transformed it to 'sort -u' or 'sort |
> uniq' which I agree is total nonsense. Sorry about that.
>
> The sort key selection probably still needs to be adjusted though. When
> using ISO_TIMESTAMPS, the whole timestamp is in the first field so '-k
> 1' should be used. If not using ISO_TIMESTAMPS, the original behaviour
> of "-k 1,3" is used (I don't understand it though: that would be the
> month and then the hh:mm::ss parts?).

It's sorting on fields one to three which are the 'Mmm DD HH:MM:SS'
bit in the old-style timestamp:
so (except for around midnight!) this sorts in date order, but not
change the order of messages.

(i dont know  -u is not included!)

Bug#1041524: logcheck: badly handles "rsyslog + journalctl" checking

Reply via email to