hi Eugene, On Mon, Nov 13, 2006 at 08:31:14AM -0800, Eugene Aleynikov wrote: > Another one: > 023650:20061110:062617 Got signal. Exiting ... > 023649:20061110:062617 Got signal. Exiting ... > 023648:20061110:062617 Got signal. Exiting ... > 021462:20061110:062617 zabbix_agentd started. ZABBIX 1.1.3. > 021462:20061110:062617 Cannot bind to port 10050. Error [Address already > in use] > . Another zabbix_agentd already running ? > 023652:20061110:062617 Got signal. Exiting ... > 022097:20061110:080959 zabbix_agentd started. ZABBIX 1.1.3. > > > THis problem doesnt happen every night. Possibly need delay in init > script after stopping zabbix and before starting it again. > I consider this issue to be serious since i start my every morning with > restarting of dead zabbix agents on 2-3 servers out of 100.
hm.. this is quite interesting now. Such problems occured with 1.1.2 aswell, upstream did not correctly close all used sockets so both agent and server failed to restart if there were still open sockets/connections in TIME_WAIT state. This was fixed by using SO_REUSEADDR (see #374758 for more information). Looking at the code, this issue should have been gone with 1.1.3: /* Enable address reuse */ /* This is to immediately use the address even if it is in TIME_WAIT state */ /* http://www-128.ibm.com/developerworks/linux/library/l-sockpit/index.html */ on = 1; if( -1 == setsockopt( sockfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on) )) { zabbix_log(LOG_LEVEL_WARNING, "Cannot setsockopt SO_REUSEADDR [%s]", strerror(errno)); } does this Problem also occur while doing a simple restart of the agent using invoke-rc.d? Can you confirm the zabbix agent is really shutdown? Adding an delay to the init script is a no-go. It would be possible to add something like this to the logrotate scripts aswell but i really think this is a problem of the application and should be fixed there. bye, - michael -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]