Package: icinga
Version: 1.7.1-2
Severity: normal
Tags: patch

Hello,

upstream just released version 1.7.2, which fixes a highly annoying bug
causing duplicate events

| we learned by different sources (#2964, using mod_gearman, PNP rrdtool
| illegal update time) that a performance increasing change in Icinga Core
| 1.7.0 (#2182) was causing trouble on the event scheduling logic, causing
| duplicated events when rescheduling those. This resulted in duplicated
| (or even more) check events under various circumstances,  as well as too
| much performance data and more outputs. We’ve been testing various fixes
| in our development branches (#2676)

In my environment this is causing about 10k lines per hour of the sort

2012-08-27 21:04:34 [11428] [0] RRDs::update
/var/lib/pnp4nagios/perfdata/dpi-04/_HOST__rta.rrd 1346101453:0.222
2012-08-27 21:04:34 [11428] [0] RRDs::update ERROR rrdcached: illegal
attempt to update using time 1346101453.000000 when last update time is
1346101453.000000 (minimum one second step)

I patched icinga with the two upstream commits locally (see attached
patch) and it immediately solved my issue as well as reducing CPU usage
by 30%.

I'm not sure whether this could be a candidate for Wheezy at this time.

-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-3-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages icinga depends on:
ii  icinga-cgi   1.7.1-2
ii  icinga-core  1.7.1-2

icinga recommends no packages.

Versions of packages icinga suggests:
pn  nagios-nrpe-plugin  <none>

-- no debconf information
commit 47c5b92865a1003f8197ebc5c20a051e6da95dda
Author: Bernhard Schmidt <bschm...@teleport-iabg.de>
Date:   Mon Aug 27 20:57:56 2012 +0000

    cherry-pick upstream commits ec9c5e3 and 45225cd to fix duplicate events

diff --git a/debian/patches/00list b/debian/patches/00list
index 4b8d6cf..65ead31 100644
--- a/debian/patches/00list
+++ b/debian/patches/00list
@@ -4,3 +4,5 @@
 70_fix_eventhandler_paths.dpatch
 80_fix_dbd_handling.dpatch
 95_fix_db_upgrade.dpatch
+99_fix_duplicate_events.dpatch
+99_fix_duplicate_events2.dpatch
diff --git a/debian/patches/99_fix_duplicate_events.dpatch b/debian/patches/99_fix_duplicate_events.dpatch
new file mode 100755
index 0000000..8c095fb
--- /dev/null
+++ b/debian/patches/99_fix_duplicate_events.dpatch
@@ -0,0 +1,192 @@
+#! /bin/sh /usr/share/dpatch/dpatch-run
+## 99_fix_duplicate_events.dpatch by  <bschm...@teleport-iabg.de>
+##
+## All lines beginning with `## DP:' are a description of the patch.
+## DP: No description.
+
+@DPATCH@
+From: Michael Friedrich <michael.friedr...@univie.ac.at>
+Date: Fri, 6 Jul 2012 11:21:41 +0000 (+0200)
+Subject: * core: fix duplicated events on check scheduling logic for new events (Andreas Erics... 
+X-Git-Tag: v1.7.2~27
+X-Git-Url: https://git.icinga.org/?p=icinga-core.git;a=commitdiff_plain;h=379b71295b4846195590350ccb309b3ec79212da;hp=29fc8aec22c9a985cd6710a3ee1985c00618536b
+
+* core: fix duplicated events on check scheduling logic for new events (Andreas Ericsson) #2676 #2993
+
+previously, the logic on scheduling a new event was changed using the
+new_event attribute. the decision for actually scheduling a new event
+now happens generalized after having decided to actually do so.
+furthermore next_check_event is correctly assigned to that new event for
+the host|service check (which is a bug in previous versions, causing
+duplicate events under different circumstances).
+
+refs #2676
+refs #2993
+
+Conflicts:
+
+	Changelog
+---
+
+diff --git a/Changelog b/Changelog
+index 3f1bc89..4da81b4 100644
+--- a/Changelog
++++ b/Changelog
+@@ -4,6 +4,16 @@ Icinga 1.7.x Change Log
+ 
+ Thanks to all contributers, testers and developers. Please read AUTHORS and THANKS for a detailed list :-)
+ 
++1.7.2 - XX/XX/2012
++
++ENHANCEMENTS
++
++FIXES
++* core: fix duplicated events on check scheduling logic for new events (Andreas Ericsson) #2676 #2993 - MF
++
++CHANGES
++
++
+ 1.7.1 - 18/06/2012
+ 
+ FIXES
+diff --git a/base/checks.c b/base/checks.c
+index eedecb7..5d02110 100644
+--- a/base/checks.c
++++ b/base/checks.c
+@@ -1881,15 +1881,6 @@ void schedule_service_check(service *svc, time_t check_time, int options) {
+ 		return;
+ 	}
+ 
+-	/* allocate memory for a new event item */
+-	new_event = (timed_event *)malloc(sizeof(timed_event));
+-	if (new_event == NULL) {
+-
+-		logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Could not reschedule check of service '%s' on host '%s'!\n", svc->description, svc->host_name);
+-
+-		return;
+-	}
+-
+ 	/* default is to use the new event */
+ 	use_original_event = FALSE;
+ 
+@@ -1938,32 +1929,38 @@ void schedule_service_check(service *svc, time_t check_time, int options) {
+ 				log_debug_info(DEBUGL_CHECKS, 2, "New service check event occurs after the existing event, so we'll ignore it.\n");
+ 			}
+ 		}
++	}
++
+ 
+-		/* the originally queued event won the battle, so keep it */
+-		if (use_original_event == TRUE) {
+-			my_free(new_event);
++	/*
++	 * we can't use the original event,
++	 * so schedule a new event
++	 */
++	if (use_original_event == FALSE) {
++
++		/* allocate memory for a new event item */
++		new_event = (timed_event *)malloc(sizeof(timed_event));
++
++		if (new_event == NULL) {
++			logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Could not reschedule check of service '%s' on host '%s'!\n", svc->description, svc->host_name);
++			return;
+ 		}
+ 
+-		/* else we're using the new event, so remove the old one */
+-		else {
++		/* make sure we kill off the old event */
++		if (temp_event) {
+ 			remove_event(temp_event, &event_list_low, &event_list_low_tail);
+-			/* save new event for later */
+-			svc->next_check_event = new_event;
+ 			my_free(temp_event);
+ 		}
+-	}
+-
+-	/* save check options for retention purposes */
+-	svc->check_options = options;
+-
+-	/* schedule a new event */
+-	if (use_original_event == FALSE) {
+ 
+ 		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new service check event.\n");
+ 
+-		/* set the next service check time */
++		/* set the next service check event and time */
++		svc->next_check_event = new_event;
+ 		svc->next_check = check_time;
+ 
++		/* save check options for retention purposes */
++		svc->check_options = options;
++
+ 		/* place the new event in the event queue */
+ 		new_event->event_type = EVENT_SERVICE_CHECK;
+ 		new_event->event_data = (void *)svc;
+@@ -2360,14 +2357,6 @@ void schedule_host_check(host *hst, time_t check_time, int options) {
+ 		return;
+ 	}
+ 
+-	/* allocate memory for a new event item */
+-	if ((new_event = (timed_event *)malloc(sizeof(timed_event))) == NULL) {
+-
+-		logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Could not reschedule check of host '%s'!\n", hst->name);
+-
+-		return;
+-	}
+-
+ 	/* default is to use the new event */
+ 	use_original_event = FALSE;
+ 
+@@ -2416,32 +2405,34 @@ void schedule_host_check(host *hst, time_t check_time, int options) {
+ 				log_debug_info(DEBUGL_CHECKS, 2, "New host check event occurs after the existing event, so we'll ignore it.\n");
+ 			}
+ 		}
++	}
+ 
+-		/* the originally queued event won the battle, so keep it */
+-		if (use_original_event == TRUE) {
+-			my_free(new_event);
++	/*
++	 * we can't use the original event,
++	 * so schedule a new event
++	 */
++	if (use_original_event == FALSE) {
++
++		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event.\n");
++
++		/* allocate memory for a new event item */
++		if((new_event = (timed_event *)malloc(sizeof(timed_event))) == NULL) {
++			logit(NSLOG_RUNTIME_WARNING, TRUE, "Warning: Could not reschedule check of host '%s'!\n", hst->name);
++			return;
+ 		}
+ 
+-		/* else use the new event, so remove the old */
+-		else {
++		if (temp_event) {
+ 			remove_event(temp_event, &event_list_low, &event_list_low_tail);
+-			/* save new event for later */
+-			hst->next_check_event = new_event;
+ 			my_free(temp_event);
+ 		}
+-	}
+ 
+-	/* save check options for retention purposes */
+-	hst->check_options = options;
+-
+-	/* use the new event */
+-	if (use_original_event == FALSE) {
+-
+-		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event.\n");
+-
+-		/* set the next host check time */
++		/* set the next host check event and time */
++		hst->next_check_event = new_event;
+ 		hst->next_check = check_time;
+ 
++		/* save check options for retention purposes */
++		hst->check_options = options;
++
+ 		/* place the new event in the event queue */
+ 		new_event->event_type = EVENT_HOST_CHECK;
+ 		new_event->event_data = (void *)hst;
diff --git a/debian/patches/99_fix_duplicate_events2.dpatch b/debian/patches/99_fix_duplicate_events2.dpatch
new file mode 100755
index 0000000..0b11ec3
--- /dev/null
+++ b/debian/patches/99_fix_duplicate_events2.dpatch
@@ -0,0 +1,175 @@
+#! /bin/sh /usr/share/dpatch/dpatch-run
+## 99_fix_duplicate_events2.dpatch by  <bschm...@teleport-iabg.de>
+##
+## All lines beginning with `## DP:' are a description of the patch.
+## DP: No description.
+
+@DPATCH@
+From: Michael Friedrich <michael.friedr...@gmail.com>
+Date: Sun, 19 Aug 2012 17:09:21 +0000 (+0200)
+Subject: core: avoid duplicate events when scheduling forced host|service check (Imri Zvik... 
+X-Git-Tag: v1.7.2~26
+X-Git-Url: https://git.icinga.org/?p=icinga-core.git;a=commitdiff_plain;h=f32fbf844589c4e1ea15f3da5175b4828de47773;hp=379b71295b4846195590350ccb309b3ec79212da
+
+core: avoid duplicate events when scheduling forced host|service check (Imri Zvik) #2993
+
+previously, we had introduced a hash-like implementation of
+host|service->next_check_event directly pointing to the next
+scheduled event. this algorithm is being used within
+schedule_host|service_check, determing wether to use the
+already assigned event, or scheduling a new event. Since we
+did not populate the event_data (host or service object) when
+adding a new event to the scheduler, this became insame, always
+rescheduling a new event, but not discarding the old one.
+
+This has been partly fixed in #2676 with refactoring that detection
+and saving the next_check_event accordingly. But on already scheduled
+events which were forced (overridden), another bug was unveiled.
+
+Now that we add the reverse pointer from the host|service event_data
+back to the newly created event when forcing a check, we will make sure
+that those events are checked correctly, and executed/discarded in the
+first place, and not always creating a new event, seperated from the rest.
+
+basically, using the previous implementation (with and without the fix
+from #2676) we've created an event bomb under various circumstances,
+especially when future events would have been overridden (forced checks).
+as events usually result in checks, which can result into perfdata, this
+could possibly the root cause for #2924 as well, as other users reported
+on the portal as well.
+
+http://www.monitoring-portal.org/wbb/index.php?page=Thread&threadID=26352
+
+With the kind patch provided by Imri Zvik, this now works like expected.
+Adapted the debug output a bit myself, so with debug_level=272 it will be
+easy to see what happens on events scheduling - and not the non-telling
+mess before.
+
+kudos to Imri Zvik for the patch.
+
+refs #2993
+refs #2676
+refs #2182
+refs #2924
+
+Conflicts:
+	Changelog
+---
+
+diff --git a/Changelog b/Changelog
+index 4da81b4..70772b0 100644
+--- a/Changelog
++++ b/Changelog
+@@ -4,12 +4,13 @@ Icinga 1.7.x Change Log
+ 
+ Thanks to all contributers, testers and developers. Please read AUTHORS and THANKS for a detailed list :-)
+ 
+-1.7.2 - XX/XX/2012
++1.7.2 - 27/08/2012
+ 
+ ENHANCEMENTS
+ 
+ FIXES
+ * core: fix duplicated events on check scheduling logic for new events (Andreas Ericsson) #2676 #2993 - MF
++* core: avoid duplicate events when scheduling forced host|service check (Imri Zvik) #2993 - MF
+ 
+ CHANGES
+ 
+diff --git a/THANKS b/THANKS
+index d11030f..6a8a243 100644
+--- a/THANKS
++++ b/THANKS
+@@ -343,3 +343,4 @@ in various ways.  If we missed your name, let us know.
+ * Michal Zimen
+ * Dennis van Zuijlekom
+ * Pawel Zuzelski
++* Imri Zvik
+diff --git a/base/checks.c b/base/checks.c
+index 5d02110..8644633 100644
+--- a/base/checks.c
++++ b/base/checks.c
+@@ -1894,7 +1894,7 @@ void schedule_service_check(service *svc, time_t check_time, int options) {
+ 	 */
+ 	if (temp_event != NULL) {
+ 
+-		log_debug_info(DEBUGL_CHECKS, 2, "Found another service check event for this service @ %s", ctime(&temp_event->run_time));
++		log_debug_info(DEBUGL_CHECKS, 2, "Found another service check event for service '%s' on host '%s' @ %s", svc->description, svc->host_name, ctime(&temp_event->run_time));
+ 
+ 		/* use the originally scheduled check unless we decide otherwise */
+ 		use_original_event = TRUE;
+@@ -1938,6 +1938,8 @@ void schedule_service_check(service *svc, time_t check_time, int options) {
+ 	 */
+ 	if (use_original_event == FALSE) {
+ 
++		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new service check event for '%s' on host '%s' @ %s", svc->description, svc->host_name, ctime(&check_time));
++
+ 		/* allocate memory for a new event item */
+ 		new_event = (timed_event *)malloc(sizeof(timed_event));
+ 
+@@ -1948,12 +1950,11 @@ void schedule_service_check(service *svc, time_t check_time, int options) {
+ 
+ 		/* make sure we kill off the old event */
+ 		if (temp_event) {
++			log_debug_info(DEBUGL_CHECKS, 2, "Removing service check event for service '%s' on host '%s' @ %s", svc->description, svc->host_name, ctime(&temp_event->run_time));
+ 			remove_event(temp_event, &event_list_low, &event_list_low_tail);
+ 			my_free(temp_event);
+ 		}
+ 
+-		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new service check event.\n");
+-
+ 		/* set the next service check event and time */
+ 		svc->next_check_event = new_event;
+ 		svc->next_check = check_time;
+@@ -2370,7 +2371,7 @@ void schedule_host_check(host *hst, time_t check_time, int options) {
+ 	 */
+ 	if (temp_event != NULL) {
+ 
+-		log_debug_info(DEBUGL_CHECKS, 2, "Found another host check event for this host @ %s", ctime(&temp_event->run_time));
++		log_debug_info(DEBUGL_CHECKS, 2, "Found another host check event for host '%s' @ %s", hst->name, ctime(&temp_event->run_time));
+ 
+ 		/* use the originally scheduled check unless we decide otherwise */
+ 		use_original_event = TRUE;
+@@ -2413,7 +2414,7 @@ void schedule_host_check(host *hst, time_t check_time, int options) {
+ 	 */
+ 	if (use_original_event == FALSE) {
+ 
+-		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event.\n");
++		log_debug_info(DEBUGL_CHECKS, 2, "Scheduling new host check event for '%s' @ %s", hst->name, ctime(&check_time));
+ 
+ 		/* allocate memory for a new event item */
+ 		if((new_event = (timed_event *)malloc(sizeof(timed_event))) == NULL) {
+@@ -2422,6 +2423,7 @@ void schedule_host_check(host *hst, time_t check_time, int options) {
+ 		}
+ 
+ 		if (temp_event) {
++			log_debug_info(DEBUGL_CHECKS, 2, "Removing host check event for host '%s' @ %s", hst->name, ctime(&temp_event->run_time));
+ 			remove_event(temp_event, &event_list_low, &event_list_low_tail);
+ 			my_free(temp_event);
+ 		}
+diff --git a/base/events.c b/base/events.c
+index ce8666d..0cde885 100644
+--- a/base/events.c
++++ b/base/events.c
+@@ -929,6 +929,22 @@ int schedule_new_event(int event_type, int high_priority, time_t run_time, int r
+ 		new_event->event_interval = event_interval;
+ 		new_event->timing_func = timing_func;
+ 		new_event->compensate_for_time_change = compensate_for_time_change;
++		/*
++		 * we need to keep the reverse link from the (service|host *)event_data->next_check_event
++		 * to the new_event in order to stay sane on schedule_host|service_check() checks
++		 * later on, already having a new event assigned to host/service, not rescheduling a new event.
++		 * see #2993 for deeper analysis
++		 */
++		if (event_type == EVENT_SERVICE_CHECK) {
++			service *temp_service = (service *)event_data;
++			temp_service->next_check_event = new_event;
++			log_debug_info(DEBUGL_CHECKS, 2, "Service '%s' on Host '%s' next_check_event populated\n", temp_service->description, temp_service->host_name);
++		}
++		if (event_type == EVENT_HOST_CHECK) {
++			host *temp_host = (host *)event_data;
++			temp_host->next_check_event = new_event;
++			log_debug_info(DEBUGL_CHECKS, 2, "Host '%s' next_check_event populated\n", temp_host->name);
++		}
+ 	} else
+ 		return ERROR;
+ 

Reply via email to