Package: munin-node Version: 2.0.25-1 Severity: important Tags: upstream patch
Hello, I am using Munin on Debian Jessie. The default configuration is just fine for my purposes. On top of that, I just wanted to activate email notification about critical states. So, I activated the following config item in /etc/munin/munin.conf and provided my email address: contact.someuser.command mail -s "Munin notification" u...@domain.tld FYI: the documentation for this variable is: "Drop u...@domain.tld an email everytime something changes (OK -> WARNING, CRITICAL -> OK, etc)". The issue ========= >From there on, I retrieved an email about the "Disk usage in percent" being >"OK" upon almost every munin-node run (every 5 minutes). These emails clearly >are false positive status changes, because, as the email says, the status is >"OK". In order to understand why this email is actually sent, let's look at the body of such an email: localdomain :: localhost.localdomain :: Disk usage in percent OKs: / is 19.36, /run is 11.03, /run/user/1000 is 0.00, /dev/shm is 0.05, /run/lock is 0.00, /run/user/1004 is 0.00, /run/user/1002 is 0.00, /dev is 0.00, /sys/fs/cgroup is 0.00. In the list of mount points and devices there are some mount points of the pattern /run/user/*. It took me a while until I realized that whenever I got such an email, a new /run/user/* entry appeared, or an old one disappeared, compared to the corresponding last email. AFAIK, the /run/user/* mount points are new since systemd has been introduced. They are created by pam_systemd (http://www.freedesktop.org/software/systemd/man/pam_systemd.html). And their existence seems to fluctuate (I guess/hope this is not true for my system only): these mount points disappear and re-appear for different users on the minute time scale. Whenever such a mount point disappears between munin-node infocations, the df plugin loses a metric. When such a mount point re-appears, a seemingly "new" metric is found by the df plugin. I have looked at /var/lib/munin/limits and diffed it between two munin-node invocations. The output: localdomain;localhost.localdomain;df;_run_user_1000;state ok localdomain;localhost.localdomain;df;_run_user_1002;ok OK localdomain;localhost.localdomain;df;_run_user_1002;state ok -localdomain;localhost.localdomain;df;_run_user_1004;ok OK -localdomain;localhost.localdomain;df;_run_user_1004;state ok localdomain;localhost.localdomain;df;_sys_fs_cgroup;ok OK localdomain;localhost.localdomain;df;_sys_fs_cgroup;state ok localdomain;localhost.localdomain;df_inode;_dev;ok OK Here, we observe how the _run_user_1004 metric just disappeared. *This* triggers munin's status update detector, and makes munin send an email. A working solution ================== In /etc/munin/plugin-conf.d/munin-node, in the [df*] section, add: env.exclude_re ^/run/user This prevents the /run/user/* mount points from being monitored at all. Test output without this setting: # munin-run df _dev_vda2.value 19.3639318372447 _dev.value 0 _run.value 11.0322915980351 _dev_shm.value 0.0499760368807535 _run_lock.value 0 _sys_fs_cgroup.value 0 _run_user_1002.value 0 _run_user_1000.value 0 With this setting: # munin-run df _dev_vda2.value 19.3639318372447 _dev.value 0 _run.value 11.0322915980351 _dev_shm.value 0.0499760368807535 _run_lock.value 0 _sys_fs_cgroup.value 0 In order to make the contact.someuser.command useful out-of-the-box, I suggest adding env.exclude_re ^/run/user to the default df config in the Debian munin-node package. If the "out-of-the-box" argument isn't enough, I want to add that getting to the bottom of this email "spam" and disabling it really requires some effort. It easily takes about an hour, it is not obvious how to get rid of these emails: one needs to come to the insight that the munin documentation do not tell the entire truth, and then carefully diff the emails and/or the limits file before even getting an idea how to tackle this issue. My two cents about the relevance for Debian and upstream ======================================================== 1st cent) I expect the contact.someuser.command setting to provide somewhat meaningful notifications out-of-the-box, on a Debian machine. Setting this should at least not make the notification system totally unusable (and this spam renders it unusable, in my opinion). That is why I submit this to the Debian package. 2nd cent) Furthermore, and this is maybe more relevant for upstream, the "status change detection" method should be better documented, or maybe even changed and better documented: from the documentation I did not expect at all that (dis)appearing metrics within a plugin would be considered as a status change. From my point of view, there should not have been a single mail, because the df plugin reported an overall "OK" state every single time, so there is was no transition from "OK" to a different state. Some part of munin seems to disagree with this point of view, but where is this documented? Thanks, Jan-Philip Gehrcke -- System Information: Debian Release: 8.1 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.16.0-4-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages munin-node depends on: ii gawk 1:4.1.1+dfsg-1 ii init-system-helpers 1.22 ii libnet-server-perl 2.008-1 ii lsb-base 4.1+Debian13+nmu1 ii munin-common 2.0.25-1 ii munin-plugins-core 2.0.25-1 ii perl 5.20.2-3+deb8u1 ii procps 2:3.3.9-9 Versions of packages munin-node recommends: ii libnet-snmp-perl 6.0.1-2 ii munin-plugins-extra 2.0.25-1 Versions of packages munin-node suggests: ii acpi 1.7-1 pn ethtool <none> ii hdparm 9.43-2 pn libcrypt-ssleay-perl <none> pn libdbd-pg-perl <none> pn liblwp-useragent-determined-perl <none> pn libnet-irc-perl <none> pn libtext-csv-xs-perl <none> ii libwww-perl 6.08-1 pn libxml-simple-perl <none> pn logtail <none> ii munin 2.0.25-1 pn munin-plugins-java <none> ii mysql-client-5.5 [mysql-client] 5.5.43-0+deb8u1 ii net-tools 1.60-26+b1 ii python 2.7.9-1 pn ruby <none> pn smartmontools <none> -- Configuration Files: /etc/munin/munin-node.conf changed [not included] /etc/munin/plugin-conf.d/munin-node [Errno 13] Permission denied: u'/etc/munin/plugin-conf.d/munin-node' -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org