> We are having a problem when we are restarting our app that runs under
> emperor mode. Sometimes, when we reload the config (an ini file) one or
> two
> workers will not die and will start to consume 100% of a cpu, and then die
> off ~60 seconds later. This will sporadically happen no matter how many
> workers we spawn.
>
> We are running Django under uwsgi (version 2.0.5) on Ubuntu 14.04 on
> Amazon
> EC2.
>
> Configs, logs and strace output (for one of the workers that hung) are
> below. Has anyone seen/experienced this problem before? My assumption for
> the 60 second time is the harakiri time, though I'm not 100% sure on that.
>
> Here's the emperor log when a worker was hung:
> Mon Jul 21 22:39:41 2014 - [emperor] reload the uwsgi instance <app>
> Mon Jul 21 22:40:44 2014 - [emperor] vassal <app> is ready to accept
> requests
>
> Here's our app ini config (some info removed, though all commands are here
> that are in the config):
> [uwsgi]
> uid = <uid>
> gid = <gid>
> socket = 127.0.0.1:<port>
> listen = 16384
> workers = 4
> threads = 2
> thunder-lock = true
> max-requests = 20000
> harakiri = 60
> harakiri-verbose = true
> master = true
> single-interpreter = true
> virtualenv = <virtualenv>
> pythonpath = <pythonpath>
> env = DJANGO_SETTINGS_MODULE=<module>
> module = <wsgi_file>
> pidfile2 = <pidfile>
> logto2 = <logfile>
> logfile-chmod = 644
> stats = 127.0.0.1:<stats_port>
> post-buffering = 65536
> buffer-size = 32768
> disable-logging = true
> chdir = <dir>
>
> I was able to get an strace off of one of the hung workers, and this is
> what I got (starting from when they get the signal to reload:
> close(4)                                = 0
> futex(0x7f3a15c37000, FUTEX_LOCK_PI, 1) = ? ERESTARTNOINTR (To be
> restarted)
> --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=1660, si_uid=601} ---
> write(2, "Gracefully killing worker 6 (pid"..., 44) = -1 EPIPE (Broken
> pipe)
> open("/usr/lib/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such
> file or directory)
> open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4
> fstat(4, {st_mode=S_IFREG|0644, st_size=46184, ...}) = 0
> mmap(NULL, 46184, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7f3a15b3d000
> close(4)                                = 0
> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or
> directory)
> open("/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 4
> read(4,
> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260*\0\0\0\0\0\0"..., 832)
> = 832
> fstat(4, {st_mode=S_IFREG|0644, st_size=90080, ...}) = 0
> mmap(NULL, 2185952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0)
> =
> 0x7f3a09947000
> mprotect(0x7f3a0995d000, 2093056, PROT_NONE) = 0
> mmap(0x7f3a09b5c000, 4096, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x15000) = 0x7f3a09b5c000
> close(4)                                = 0
> munmap(0x7f3a15b3d000, 46184)           = 0
> tgkill(16665, 16668, SIGRTMIN)          = 0
> rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7f3a15826340},
> {0x460790,
> [], SA_RESTORER, 0x7f3a15826340}, 8) = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a09907000
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> munmap(0x7f3a098c7000, 262144)          = 0
> mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
> = 0x7f3a098c7000
> +++ killed by SIGKILL +++
>
> Any help would be appreciated. If anyone wants any other info, just let me
> know and I'll supply it.
>
> Thanks,
> Andy
> _______________________________________________
> uWSGI mailing list
> [email protected]
> http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
>

the 60 seconds timeout is the --worker-reload-mercy value (default 60). Is
the maximum amount of time the master will wait for a worker to die (then
the master will send -9). As a worker is free to ignore signals of the
master, this timeout is a security measure for avoiding the master hanging
forever.

Unfortunately your strace does not show anything useful that could explain
why your worker hung, but if 60 seconds are too much, just tune them to a
lower value.


-- 
Roberto De Ioris
http://unbit.it
_______________________________________________
uWSGI mailing list
[email protected]
http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi

Reply via email to