On 2018-12-21 17:32:30 +0300, Sergey B Kirpichev wrote:
> Does it send you alert if you
> remove "if changed link capacity then alert" line too, correct?

Yes, I still get the same messages.

> If so, I suspect that the currect monit's behaviour is correct and I suggest
> you just adding "if failed link then unmonitor"-like line.

No, unmonitoring the service is not OK, since I still want alerts
when the Ethernet cable is plugged back in.

In any case, this cannot be correct since while the Ethernet cable
is *not* plugged, I get messages with:

------------------------------------------------------------
Subject: monit alert --  Link up eth0

Link up Service eth0

        Date:        Sat, 22 Dec 2018 02:40:38
        Action:      alert
        Host:        zira
        Description: link data collection succeeded
------------------------------------------------------------

The "Link up eth0" cannot be correct since the cable is not plugged.
And just after that (at the same second), monit sends a

------------------------------------------------------------
Subject: monit alert --  Link down eth0

Link down Service eth0

        Date:        Sat, 22 Dec 2018 02:40:38
        Action:      alert
        Host:        zira
        Description: link down
------------------------------------------------------------

(I suppose that this is a consequence of the "Link up eth0", which
yields a loop).

So, it seems to be a bug in the monit code that makes it think that
the link is up while it is not.

In validate.c, I see:

[...]
        for (LinkStatus_T link = s->linkstatuslist; link; link = link->next) {
                Event_post(s, Event_Link, State_Succeeded, link->action, "link 
data collection succeeded");
        }
        // State
        if (! Link_getState(s->inf.net->stats)) {
                for (LinkStatus_T link = s->linkstatuslist; link; link = 
link->next)
                        Event_post(s, Event_Link, State_Failed, link->action, 
"link down");
                return State_Failed; // Terminate test if the link is down
        } else {
                for (LinkStatus_T link = s->linkstatuslist; link; link = 
link->next)
                        Event_post(s, Event_Link, State_Succeeded, 
link->action, "link up");
        }
[...]

This seems suspicious. The "link data collection succeeded" preceedes
the check of the state, while the alert says "Link up eth0". So, if
I understand correctly, this yields a State_Succeeded / State_Failed
loop.

That's one point. I don't underdtand the code after "// State" either
(which may be another bug): because the "for" loops are under the
"if" and the "else", it seems to consider that *all* links are down
or *all* links are up. (In my case, I monitor only one link, so that's
not an issue, but it may be one in more complex settings.)

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Reply via email to