Package: munin-node
Version: 2.0.25-1
Severity: important
Tags: upstream patch


Hello,

I am using Munin on Debian Jessie. The default configuration is just fine for 
my purposes. On top of that, I just wanted to activate email notification about 
critical states.

So, I activated the following config item in /etc/munin/munin.conf and provided 
my email address:

contact.someuser.command mail -s "Munin notification" u...@domain.tld

FYI: the documentation for this variable is: "Drop u...@domain.tld an email 
everytime something changes (OK -> WARNING, CRITICAL -> OK, etc)".

The issue
=========
>From there on, I retrieved an email about the "Disk usage in percent" being 
>"OK" upon almost every munin-node run (every 5 minutes). These emails clearly 
>are false positive status changes, because, as the email says, the status is 
>"OK".

In order to understand why this email is actually sent, let's look at the body 
of such an email:



localdomain :: localhost.localdomain :: Disk usage in percent
    OKs: / is 19.36, /run is 11.03, /run/user/1000 is 0.00, /dev/shm is 0.05, 
/run/lock is 0.00, /run/user/1004 is 0.00, /run/user/1002 is 0.00, /dev is 
0.00, /sys/fs/cgroup is 0.00.



In the list of mount points and devices there are some mount points of the 
pattern /run/user/*. It took me a while until I realized that whenever I got 
such an email, a new /run/user/* entry appeared, or an old one disappeared, 
compared to the corresponding last email.

AFAIK, the /run/user/* mount points are new since systemd has been introduced. 
They are created by pam_systemd 
(http://www.freedesktop.org/software/systemd/man/pam_systemd.html). And their 
existence seems to fluctuate (I guess/hope this is not true for my system 
only): these mount points disappear and re-appear for different users on the 
minute time scale.

Whenever such a mount point disappears between munin-node infocations, the df 
plugin loses a metric. When such a mount point re-appears, a seemingly "new" 
metric is found by the df plugin. I have looked at /var/lib/munin/limits and 
diffed it between two munin-node invocations. The output:

     localdomain;localhost.localdomain;df;_run_user_1000;state ok
     localdomain;localhost.localdomain;df;_run_user_1002;ok OK
     localdomain;localhost.localdomain;df;_run_user_1002;state ok
    -localdomain;localhost.localdomain;df;_run_user_1004;ok OK
    -localdomain;localhost.localdomain;df;_run_user_1004;state ok
     localdomain;localhost.localdomain;df;_sys_fs_cgroup;ok OK
     localdomain;localhost.localdomain;df;_sys_fs_cgroup;state ok
     localdomain;localhost.localdomain;df_inode;_dev;ok OK 
     
Here, we observe how the _run_user_1004 metric just disappeared. *This* 
triggers munin's status update detector, and makes munin send an email. 


A working solution
==================

In /etc/munin/plugin-conf.d/munin-node, in the [df*] section, add:

    env.exclude_re ^/run/user

This prevents the /run/user/* mount points from being monitored at all. Test 
output without this setting:

# munin-run df
_dev_vda2.value 19.3639318372447
_dev.value 0
_run.value 11.0322915980351
_dev_shm.value 0.0499760368807535
_run_lock.value 0
_sys_fs_cgroup.value 0
_run_user_1002.value 0
_run_user_1000.value 0

With this setting:

# munin-run df
_dev_vda2.value 19.3639318372447
_dev.value 0
_run.value 11.0322915980351
_dev_shm.value 0.0499760368807535
_run_lock.value 0
_sys_fs_cgroup.value 0


In order to make the contact.someuser.command useful out-of-the-box, I suggest 
adding env.exclude_re ^/run/user to the default df config in the Debian 
munin-node package. If the "out-of-the-box" argument isn't enough, I want to 
add that getting to the bottom of this email "spam" and disabling it really 
requires some effort. It easily takes about an hour, it is not obvious how to 
get rid of these emails: one needs to come to the insight that the munin 
documentation do not tell the entire truth, and then carefully diff the emails 
and/or the limits file before even getting an idea how to tackle this issue.


My two cents about the relevance for Debian and upstream
========================================================

1st cent) I expect the contact.someuser.command setting to provide somewhat 
meaningful notifications out-of-the-box, on a Debian machine. Setting this 
should at least not make the notification system totally unusable (and this 
spam renders it unusable, in my opinion). That is why I submit this to the 
Debian package.

2nd cent) Furthermore, and this is maybe more relevant for upstream, the 
"status change detection" method should be better documented, or maybe even 
changed and better documented: from the documentation I did not expect at all 
that (dis)appearing metrics within a plugin would be considered as a status 
change. From my point of view, there should not have been a single mail, 
because the df plugin reported an overall "OK" state every single time, so 
there is was no transition from "OK" to a different state. Some part of munin 
seems to disagree with this point of view, but where is this documented?



Thanks,


Jan-Philip Gehrcke





-- System Information:
Debian Release: 8.1
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.16.0-4-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages munin-node depends on:
ii  gawk                 1:4.1.1+dfsg-1
ii  init-system-helpers  1.22
ii  libnet-server-perl   2.008-1
ii  lsb-base             4.1+Debian13+nmu1
ii  munin-common         2.0.25-1
ii  munin-plugins-core   2.0.25-1
ii  perl                 5.20.2-3+deb8u1
ii  procps               2:3.3.9-9

Versions of packages munin-node recommends:
ii  libnet-snmp-perl     6.0.1-2
ii  munin-plugins-extra  2.0.25-1

Versions of packages munin-node suggests:
ii  acpi                              1.7-1
pn  ethtool                           <none>
ii  hdparm                            9.43-2
pn  libcrypt-ssleay-perl              <none>
pn  libdbd-pg-perl                    <none>
pn  liblwp-useragent-determined-perl  <none>
pn  libnet-irc-perl                   <none>
pn  libtext-csv-xs-perl               <none>
ii  libwww-perl                       6.08-1
pn  libxml-simple-perl                <none>
pn  logtail                           <none>
ii  munin                             2.0.25-1
pn  munin-plugins-java                <none>
ii  mysql-client-5.5 [mysql-client]   5.5.43-0+deb8u1
ii  net-tools                         1.60-26+b1
ii  python                            2.7.9-1
pn  ruby                              <none>
pn  smartmontools                     <none>

-- Configuration Files:
/etc/munin/munin-node.conf changed [not included]
/etc/munin/plugin-conf.d/munin-node [Errno 13] Permission denied: 
u'/etc/munin/plugin-conf.d/munin-node'

-- no debconf information


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to