Package: icinga Version: 1.7.1-2 Severity: normal Tags: patch Hello,
upstream just released version 1.7.2, which fixes a highly annoying bug causing duplicate events | we learned by different sources (#2964, using mod_gearman, PNP rrdtool | illegal update time) that a performance increasing change in Icinga Core | 1.7.0 (#2182) was causing trouble on the event scheduling logic, causing | duplicated events when rescheduling those. This resulted in duplicated | (or even more) check events under various circumstances, as well as too | much performance data and more outputs. We’ve been testing various fixes | in our development branches (#2676) In my environment this is causing about 10k lines per hour of the sort 2012-08-27 21:04:34 [11428] [0] RRDs::update /var/lib/pnp4nagios/perfdata/dpi-04/_HOST__rta.rrd 1346101453:0.222 2012-08-27 21:04:34 [11428] [0] RRDs::update ERROR rrdcached: illegal attempt to update using time 1346101453.000000 when last update time is 1346101453.000000 (minimum one second step) I patched icinga with the two upstream commits locally (see attached patch) and it immediately solved my issue as well as reducing CPU usage by 30%. I'm not sure whether this could be a candidate for Wheezy at this time. -- System Information: Debian Release: wheezy/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 3.2.0-3-amd64 (SMP w/2 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages icinga depends on: ii icinga-cgi 1.7.1-2 ii icinga-core 1.7.1-2 icinga recommends no packages. Versions of packages icinga suggests: pn nagios-nrpe-plugin <none> -- no debconf information
commit 47c5b92865a1003f8197ebc5c20a051e6da95dda Author: Bernhard Schmidt <bschm...@teleport-iabg.de> Date: Mon Aug 27 20:57:56 2012 +0000 cherry-pick upstream commits ec9c5e3 and 45225cd to fix duplicate events diff --git a/debian/patches/00list b/debian/patches/00list index 4b8d6cf..65ead31 100644 --- a/debian/patches/00list +++ b/debian/patches/00list @@ -4,3 +4,5 @@ 70_fix_eventhandler_paths.dpatch 80_fix_dbd_handling.dpatch 95_fix_db_upgrade.dpatch +99_fix_duplicate_events.dpatch +99_fix_duplicate_events2.dpatch diff --git a/debian/patches/99_fix_duplicate_events.dpatch b/debian/patches/99_fix_duplicate_events.dpatch new file mode 100755 index 0000000..8c095fb --- /dev/null +++ b/debian/patches/99_fix_duplicate_events.dpatch @@ -0,0 +1,192 @@ +#! /bin/sh /usr/share/dpatch/dpatch-run +## 99_fix_duplicate_events.dpatch by <bschm...@teleport-iabg.de> +## +## All lines beginning with `## DP:' are a description of the patch. +## DP: No description. + +@DPATCH@ +From: Michael Friedrich <michael.friedr...@univie.ac.at> +Date: Fri, 6 Jul 2012 11:21:41 +0000 (+0200) +Subject: * core: fix duplicated events on check scheduling logic for new events (Andreas Erics... +X-Git-Tag: v1.7.2~27 +X-Git-Url: https://git.icinga.org/?p=icinga-core.git;a=commitdiff_plain;h=379b71295b4846195590350ccb309b3ec79212da;hp=29fc8aec22c9a985cd6710a3ee1985c00618536b + +* core: fix duplicated events on check scheduling logic for new events (Andreas Ericsson) #2676 #2993 + +previously, the logic on scheduling a new event was changed using the +new_event attribute. the decision for actually scheduling a new event +now happens generalized after having decided to actually do so. +furthermore next_check_event is correctly assigned to that new event for +the host|service check (which is a bug in previous versions, causing +duplicate events under different circumstances). + +refs #2676 +refs #2993 + +Conflicts: + + Changelog +--- + +diff --git a/Changelog b/Changelog +index 3f1bc89..4da81b4 100644 +--- a/Changelog ++++ b/Changelog +@@ -4,6 +4,16 @@ Icinga 1.7.x Change Log + + Thanks to all contributers, testers and developers. Please read AUTHORS and THANKS for a detailed list :-) + ++1.7.2 - XX/XX/2012 ++ ++ENHANCEMENTS ++ ++FIXES ++* core: fix duplicated events on check scheduling logic for new events (Andreas Ericsson) #2676 #2993 - MF ++ ++CHANGES ++ ++ + 1.7.1 - 18/06/2012 + + FIXES +diff --git a/base/checks.c b/base/checks.c +index eedecb7..5d02110 100644 +--- a/base/checks.c ++++ b/base/checks.c +@@ -1881,15 +1881,6 @@ void schedule_service_check(service *svc, time_t check_time, int options) { + return; + } + +- /* allocate memory for a new event item */ +- new_event = (timed_event *)malloc(sizeof(timed_event)); +- if (new_event == NULL) { +- +- logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Could not reschedule check of service '%s' on host '%s'!\n", svc->description, svc->host_name); +- +- return; +- } +- + /* default is to use the new event */ + use_original_event = FALSE; + +@@ -1938,32 +1929,38 @@ void schedule_service_check(service *svc, time_t check_time, int options) { + log_debug_info(DEBUGL_CHECKS, 2, "New service check event occurs after the existing event, so we'll ignore it.\n"); + } + } ++ } ++ + +- /* the originally queued event won the battle, so keep it */ +- if (use_original_event == TRUE) { +- my_free(new_event); ++ /* ++ * we can't use the original event, ++ * so schedule a new event ++ */ ++ if (use_original_event == FALSE) { ++ ++ /* allocate memory for a new event item */ ++ new_event = (timed_event *)malloc(sizeof(timed_event)); ++ ++ if (new_event == NULL) { ++ logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Could not reschedule check of service '%s' on host '%s'!\n", svc->description, svc->host_name); ++ return; + } + +- /* else we're using the new event, so remove the old one */ +- else { ++ /* make sure we kill off the old event */ ++ if (temp_event) { + remove_event(temp_event, &event_list_low, &event_list_low_tail); +- /* save new event for later */ +- svc->next_check_event = new_event; + my_free(temp_event); + } +- } +- +- /* save check options for retention purposes */ +- svc->check_options = options; +- +- /* schedule a new event */ +- if (use_original_event == FALSE) { + + log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new service check event.\n"); + +- /* set the next service check time */ ++ /* set the next service check event and time */ ++ svc->next_check_event = new_event; + svc->next_check = check_time; + ++ /* save check options for retention purposes */ ++ svc->check_options = options; ++ + /* place the new event in the event queue */ + new_event->event_type = EVENT_SERVICE_CHECK; + new_event->event_data = (void *)svc; +@@ -2360,14 +2357,6 @@ void schedule_host_check(host *hst, time_t check_time, int options) { + return; + } + +- /* allocate memory for a new event item */ +- if ((new_event = (timed_event *)malloc(sizeof(timed_event))) == NULL) { +- +- logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Could not reschedule check of host '%s'!\n", hst->name); +- +- return; +- } +- + /* default is to use the new event */ + use_original_event = FALSE; + +@@ -2416,32 +2405,34 @@ void schedule_host_check(host *hst, time_t check_time, int options) { + log_debug_info(DEBUGL_CHECKS, 2, "New host check event occurs after the existing event, so we'll ignore it.\n"); + } + } ++ } + +- /* the originally queued event won the battle, so keep it */ +- if (use_original_event == TRUE) { +- my_free(new_event); ++ /* ++ * we can't use the original event, ++ * so schedule a new event ++ */ ++ if (use_original_event == FALSE) { ++ ++ log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event.\n"); ++ ++ /* allocate memory for a new event item */ ++ if((new_event = (timed_event *)malloc(sizeof(timed_event))) == NULL) { ++ logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Could not reschedule check of host '%s'!\n", hst->name); ++ return; + } + +- /* else use the new event, so remove the old */ +- else { ++ if (temp_event) { + remove_event(temp_event, &event_list_low, &event_list_low_tail); +- /* save new event for later */ +- hst->next_check_event = new_event; + my_free(temp_event); + } +- } + +- /* save check options for retention purposes */ +- hst->check_options = options; +- +- /* use the new event */ +- if (use_original_event == FALSE) { +- +- log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event.\n"); +- +- /* set the next host check time */ ++ /* set the next host check event and time */ ++ hst->next_check_event = new_event; + hst->next_check = check_time; + ++ /* save check options for retention purposes */ ++ hst->check_options = options; ++ + /* place the new event in the event queue */ + new_event->event_type = EVENT_HOST_CHECK; + new_event->event_data = (void *)hst; diff --git a/debian/patches/99_fix_duplicate_events2.dpatch b/debian/patches/99_fix_duplicate_events2.dpatch new file mode 100755 index 0000000..0b11ec3 --- /dev/null +++ b/debian/patches/99_fix_duplicate_events2.dpatch @@ -0,0 +1,175 @@ +#! /bin/sh /usr/share/dpatch/dpatch-run +## 99_fix_duplicate_events2.dpatch by <bschm...@teleport-iabg.de> +## +## All lines beginning with `## DP:' are a description of the patch. +## DP: No description. + +@DPATCH@ +From: Michael Friedrich <michael.friedr...@gmail.com> +Date: Sun, 19 Aug 2012 17:09:21 +0000 (+0200) +Subject: core: avoid duplicate events when scheduling forced host|service check (Imri Zvik... +X-Git-Tag: v1.7.2~26 +X-Git-Url: https://git.icinga.org/?p=icinga-core.git;a=commitdiff_plain;h=f32fbf844589c4e1ea15f3da5175b4828de47773;hp=379b71295b4846195590350ccb309b3ec79212da + +core: avoid duplicate events when scheduling forced host|service check (Imri Zvik) #2993 + +previously, we had introduced a hash-like implementation of +host|service->next_check_event directly pointing to the next +scheduled event. this algorithm is being used within +schedule_host|service_check, determing wether to use the +already assigned event, or scheduling a new event. Since we +did not populate the event_data (host or service object) when +adding a new event to the scheduler, this became insame, always +rescheduling a new event, but not discarding the old one. + +This has been partly fixed in #2676 with refactoring that detection +and saving the next_check_event accordingly. But on already scheduled +events which were forced (overridden), another bug was unveiled. + +Now that we add the reverse pointer from the host|service event_data +back to the newly created event when forcing a check, we will make sure +that those events are checked correctly, and executed/discarded in the +first place, and not always creating a new event, seperated from the rest. + +basically, using the previous implementation (with and without the fix +from #2676) we've created an event bomb under various circumstances, +especially when future events would have been overridden (forced checks). +as events usually result in checks, which can result into perfdata, this +could possibly the root cause for #2924 as well, as other users reported +on the portal as well. + +http://www.monitoring-portal.org/wbb/index.php?page=Thread&threadID=26352 + +With the kind patch provided by Imri Zvik, this now works like expected. +Adapted the debug output a bit myself, so with debug_level=272 it will be +easy to see what happens on events scheduling - and not the non-telling +mess before. + +kudos to Imri Zvik for the patch. + +refs #2993 +refs #2676 +refs #2182 +refs #2924 + +Conflicts: + Changelog +--- + +diff --git a/Changelog b/Changelog +index 4da81b4..70772b0 100644 +--- a/Changelog ++++ b/Changelog +@@ -4,12 +4,13 @@ Icinga 1.7.x Change Log + + Thanks to all contributers, testers and developers. Please read AUTHORS and THANKS for a detailed list :-) + +-1.7.2 - XX/XX/2012 ++1.7.2 - 27/08/2012 + + ENHANCEMENTS + + FIXES + * core: fix duplicated events on check scheduling logic for new events (Andreas Ericsson) #2676 #2993 - MF ++* core: avoid duplicate events when scheduling forced host|service check (Imri Zvik) #2993 - MF + + CHANGES + +diff --git a/THANKS b/THANKS +index d11030f..6a8a243 100644 +--- a/THANKS ++++ b/THANKS +@@ -343,3 +343,4 @@ in various ways. If we missed your name, let us know. + * Michal Zimen + * Dennis van Zuijlekom + * Pawel Zuzelski ++* Imri Zvik +diff --git a/base/checks.c b/base/checks.c +index 5d02110..8644633 100644 +--- a/base/checks.c ++++ b/base/checks.c +@@ -1894,7 +1894,7 @@ void schedule_service_check(service *svc, time_t check_time, int options) { + */ + if (temp_event != NULL) { + +- log_debug_info(DEBUGL_CHECKS, 2, "Found another service check event for this service @ %s", ctime(&temp_event->run_time)); ++ log_debug_info(DEBUGL_CHECKS, 2, "Found another service check event for service '%s' on host '%s' @ %s", svc->description, svc->host_name, ctime(&temp_event->run_time)); + + /* use the originally scheduled check unless we decide otherwise */ + use_original_event = TRUE; +@@ -1938,6 +1938,8 @@ void schedule_service_check(service *svc, time_t check_time, int options) { + */ + if (use_original_event == FALSE) { + ++ log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new service check event for '%s' on host '%s' @ %s", svc->description, svc->host_name, ctime(&check_time)); ++ + /* allocate memory for a new event item */ + new_event = (timed_event *)malloc(sizeof(timed_event)); + +@@ -1948,12 +1950,11 @@ void schedule_service_check(service *svc, time_t check_time, int options) { + + /* make sure we kill off the old event */ + if (temp_event) { ++ log_debug_info(DEBUGL_CHECKS, 2, "Removing service check event for service '%s' on host '%s' @ %s", svc->description, svc->host_name, ctime(&temp_event->run_time)); + remove_event(temp_event, &event_list_low, &event_list_low_tail); + my_free(temp_event); + } + +- log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new service check event.\n"); +- + /* set the next service check event and time */ + svc->next_check_event = new_event; + svc->next_check = check_time; +@@ -2370,7 +2371,7 @@ void schedule_host_check(host *hst, time_t check_time, int options) { + */ + if (temp_event != NULL) { + +- log_debug_info(DEBUGL_CHECKS, 2, "Found another host check event for this host @ %s", ctime(&temp_event->run_time)); ++ log_debug_info(DEBUGL_CHECKS, 2, "Found another host check event for host '%s' @ %s", hst->name, ctime(&temp_event->run_time)); + + /* use the originally scheduled check unless we decide otherwise */ + use_original_event = TRUE; +@@ -2413,7 +2414,7 @@ void schedule_host_check(host *hst, time_t check_time, int options) { + */ + if (use_original_event == FALSE) { + +- log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event.\n"); ++ log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event for '%s' @ %s", hst->name, ctime(&check_time)); + + /* allocate memory for a new event item */ + if((new_event = (timed_event *)malloc(sizeof(timed_event))) == NULL) { +@@ -2422,6 +2423,7 @@ void schedule_host_check(host *hst, time_t check_time, int options) { + } + + if (temp_event) { ++ log_debug_info(DEBUGL_CHECKS, 2, "Removing host check event for host '%s' @ %s", hst->name, ctime(&temp_event->run_time)); + remove_event(temp_event, &event_list_low, &event_list_low_tail); + my_free(temp_event); + } +diff --git a/base/events.c b/base/events.c +index ce8666d..0cde885 100644 +--- a/base/events.c ++++ b/base/events.c +@@ -929,6 +929,22 @@ int schedule_new_event(int event_type, int high_priority, time_t run_time, int r + new_event->event_interval = event_interval; + new_event->timing_func = timing_func; + new_event->compensate_for_time_change = compensate_for_time_change; ++ /* ++ * we need to keep the reverse link from the (service|host *)event_data->next_check_event ++ * to the new_event in order to stay sane on schedule_host|service_check() checks ++ * later on, already having a new event assigned to host/service, not rescheduling a new event. ++ * see #2993 for deeper analysis ++ */ ++ if (event_type == EVENT_SERVICE_CHECK) { ++ service *temp_service = (service *)event_data; ++ temp_service->next_check_event = new_event; ++ log_debug_info(DEBUGL_CHECKS, 2, "Service '%s' on Host '%s' next_check_event populated\n", temp_service->description, temp_service->host_name); ++ } ++ if (event_type == EVENT_HOST_CHECK) { ++ host *temp_host = (host *)event_data; ++ temp_host->next_check_event = new_event; ++ log_debug_info(DEBUGL_CHECKS, 2, "Host '%s' next_check_event populated\n", temp_host->name); ++ } + } else + return ERROR; +