Edit report at https://bugs.php.net/bug.php?id=59909&edit=1
ID: 59909 Updated by: fel...@php.net Reported by: mike at digitalstruct dot com Summary: Gearman Workers Run Away causing gearmand cpu load to 99% -Status: Open +Status: Assigned Type: Bug Package: gearman Operating System: RHEL 5.6 PHP Version: 5.3.5 Assigned To: hradtke Block user comment: N Private report: N Previous Comments: ------------------------------------------------------------------------ [2011-08-22 22:50:51] mike at digitalstruct dot com Unfortunately it is extremely hard to rule out either the extension or libgearman at this point. I've gone back and forth between both areas but I do not see any clear definition between the two in where it is breaking. Any ideas to help diagnose? ------------------------------------------------------------------------ [2011-08-22 20:07:36] hrad...@php.net Did we rule out libgearman? ------------------------------------------------------------------------ [2011-08-22 17:12:36] mike at digitalstruct dot com Description: ------------ See the following bug report: https://bugs.launchpad.net/gearmand/+bug/802850 It seems like it happens at random and only under heavier load. When the workers are killed the server returns to normal and after the restart of the worker it seems to function fine. What I've noticed is two main things between a working worker and a non-working worker (basically when things are fried). The working ones block and the non-working ones stop blocking but simply DDoS the gearmand process. Expected result: ---------------- I would think that the duration of time should not matter that the worker is running. The fact that it stops blocking should never happen while it is sitting in it's while loop. The fact that it DDoS the gearmand process means that it is likely losing some pointer to the connection and unable to resolve itself. It does not give any useful error message but continues in the way above until killed. Actual result: -------------- Working: poll([{fd=13, events=POLLIN}], 1, 1000) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 sendto(13, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12 recvfrom(13, "\0RES\0\0\0\n\0\0\0\0", 8192, 0, NULL, NULL) = 12 sendto(13, "\0REQ\0\0\0\4\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12 poll([{fd=13, events=POLLIN}], 1, 1000 <unfinished ...> Non-Working: getsockopt(13, SOL_SOCKET, SO_ERROR, [284636038280773632], [4]) = 0 sendto(13, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12 sendto(13, "\0REQ\0\0\0\4\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12 poll([{fd=13, events=POLLIN}], 1, 1000) = 1 ([{fd=13, revents=POLLIN}]) getsockopt(13, SOL_SOCKET, SO_ERROR, [284636038280773632], [4]) = 0 sendto(13, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0 <unfinished ...> ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=59909&edit=1