Edit report at https://bugs.php.net/bug.php?id=59909&edit=1

 ID:                 59909
 Updated by:         fel...@php.net
 Reported by:        mike at digitalstruct dot com
 Summary:            Gearman Workers Run Away causing gearmand cpu load
                     to 99%
-Status:             Open
+Status:             Assigned
 Type:               Bug
 Package:            gearman
 Operating System:   RHEL 5.6
 PHP Version:        5.3.5
 Assigned To:        hradtke
 Block user comment: N
 Private report:     N



Previous Comments:
------------------------------------------------------------------------
[2011-08-22 22:50:51] mike at digitalstruct dot com

Unfortunately it is extremely hard to rule out either the 
extension or libgearman at this point.  I've gone back and 
forth between both areas but I do not see any clear 
definition between the two in where it is breaking.

Any ideas to help diagnose?

------------------------------------------------------------------------
[2011-08-22 20:07:36] hrad...@php.net

Did we rule out libgearman?

------------------------------------------------------------------------
[2011-08-22 17:12:36] mike at digitalstruct dot com

Description:
------------
See the following bug report: 
https://bugs.launchpad.net/gearmand/+bug/802850

It seems like it happens at random and only under heavier 
load.  When the workers are killed the server returns to 
normal and 
after the restart of the worker it seems to function fine.

What I've noticed is two main things between a working worker 
and a non-working worker (basically when things are fried).  
The 
working ones block and the non-working ones stop blocking but 
simply DDoS the gearmand process.


Expected result:
----------------
I would think that the duration of time should not matter 
that the worker is running.  The fact that it stops blocking 
should never happen while it is sitting in it's while loop.

The fact that it DDoS the gearmand process means that it is 
likely losing some pointer to the connection and unable to 
resolve itself.  It does not give any useful error message 
but continues in the way above until killed.

Actual result:
--------------
Working:
poll([{fd=13, events=POLLIN}], 1, 1000) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
sendto(13, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 
0) = 12
recvfrom(13, "\0RES\0\0\0\n\0\0\0\0", 8192, 0, NULL, NULL) = 
12
sendto(13, "\0REQ\0\0\0\4\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 
0) = 12
poll([{fd=13, events=POLLIN}], 1, 1000 <unfinished ...>

Non-Working:
getsockopt(13, SOL_SOCKET, SO_ERROR, [284636038280773632], 
[4]) = 0
sendto(13, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 
0) = 12
sendto(13, "\0REQ\0\0\0\4\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 
0) = 12
poll([{fd=13, events=POLLIN}], 1, 1000) = 1 ([{fd=13, 
revents=POLLIN}])
getsockopt(13, SOL_SOCKET, SO_ERROR, [284636038280773632], 
[4]) = 0
sendto(13, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0 
<unfinished ...>


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=59909&edit=1

Reply via email to