On Mon, Jun 24, 2013 at 8:50 PM, Roberto De Ioris <[email protected]> wrote:
> > > On Sun, Jun 16, 2013 at 1:08 AM, Igor Katson <[email protected]> > > wrote: > > > >> > >> > >> > >> On Sat, Jun 15, 2013 at 10:02 PM, Roberto De Ioris > >> <[email protected]>wrote: > >> > >>> > >>> > On Fri, Jun 14, 2013 at 9:27 PM, Roberto De Ioris <[email protected]> > >>> > wrote: > >>> > > >>> >> > >>> >> > >>> >> I am not sure to understand what you mean, the stats server gives > >>> you > >>> >> the > >>> >> currently running connections. The problem is that with unix sockets > >>> you > >>> >> do not have the listen queue size, but it should not be a big > >>> problem. > >>> >> > >>> > I meant that I cannot reproduce the problem artificially, so when it > >>> > happens next time I will dump the stats server output. And the other > >>> one > >>> > was that I cannot find with netstat or any other command, which > >>> process > >>> is > >>> > connecting to the unix socket, only the one which is listening. If it > >>> was > >>> > TCP, "netstat -p" would show the pid of the client socket, as well as > >>> the > >>> > server socket, but for unix sockets, only the listener pid is shown. > >>> > > >>> >> > >>> > >>> I have to admit, albeit faster (and more easy to secure), unix sockets > >>> are > >>> pretty hard to debug (and they do not have listen queue monitoring). > >> > >> > >>> >> if the harakiri is not triggering, well the problem could be much > >>> more > >>> >> complex (like a db problem and so on). Are you sure all of the parts > >>> of > >>> >> the app are gevent-friendly ? (pay attention to the db adapter as > >>> >> generally they are the weak point). > >>> >> > >>> >> Well, I cannot be 100% sure, but as far as I know, yes. db is > >>> psycopg2 > >>> > with "gevent_psycopg2" applied. Everything else involving network is > >>> > pure-python. > >>> > I believe, gdb or strace will show, if a problem is inside some other > >>> > c-code. > >>> > >>> > >>> a common error i have seen is not calling monkey patching on top. > >>> > >>> I do not know if it could be your case but recent releases have the > >>> --gevent-monkey-patch option ensuring monkey patching happens as soon > >>> as > >>> possible. > >>> > >> > >> Yes, I definitely have monkey patching enabled, though not via > >> --gevent-monkey-patch, but with python code in the beginning of the wsgi > >> script. > >> > >> > >>> > >>> >> > >>> > Can enabling the master help this? i.e. if the worker is stuck, the > >>> master > >>> > shouldn't, right? > >>> > I had no "master = true" line in the ini file, but there are always 2 > >>> > processes running for each ini file, so I'm not sure if the master > >>> was > >>> > running. I now added "master = true" just in case it may help. > >>> > >>> yes it was running, but as you said background jobs are running, so > >>> your > >>> instance is not stucked, it only looks like all of your uwsgi cores are > >>> blocked waiting for something. > >>> > >>> > >> You may have misunderstood me, the background jobs were NOT running, > >> though usually they should. I have some cron-like functionality running > >> inside gevent, and it stopped producing any logging output. > >> > >> Your idea about it the reason being the gevent loop blocked, is still a > >> very good candidate for the right answer. Waiting for it to come again. > >> The > >> issue only occurred twice in a month, so I'll post here if it happens > >> again. > >> > >> Thanks! > >> > >> So I got the problem again, attached the uwsgi stats socket output. > > > > Absolutely nothing in the logs again, the backend threads (which should > be > > greenlets) just got interrupted in the middle of something according to > > the > > log messages. > > > > The master processes' strace is: > > …. > > wait4(-1, 0x7fffaa119cfc, WNOHANG, NULL) = 0 > > epoll_wait(7, {}, 1, 1000) = 0 > > > > lseek(2, 0, SEEK_CUR) > > = 29599522 > > wait4(-1, 0x7fffaa119cfc, WNOHANG, NULL) = 0 > > epoll_wait(7, {}, 1, 1000) = 0 > > > > lseek(2, 0, SEEK_CUR) > > …. > > And constantly continues this way. It seems, the master is fine. > > > > The worker's strace is stuck with this: > > strace -s 1000 -p 25618 > > Process 25618 attached - interrupt to quit > > futex(0x1b69a80, FUTEX_WAIT_PRIVATE, 0, NULL > > > > > > It looks like you are using a python c module not using the gil in the > right way (or simply not compatible with gevent). > > Can you report the list of modules you are using ? > Attached the output of *pip freeze | sort*, though I doubt this may be of any use. Sorry for lots of stuff in there, I don't use all of these modules directly, but most of it - yes. > > Currently i see this kind of problems with the zookeeper c implementation > and some older release of the memcached module based on libmemcached. I noticed now, that when i attach with gdb to the running process, I have all these threads blocked in the same "sem_wait" call, except for the main thread, which runs gevent. I have dynamic code in the django app, which shows all greenlet's and thread's stacktraces. It seems, all of the threads (not greenlets), except the main one are doing this: # ThreadID: 140100927207168 File: "/usr/local/lib/python2.7/dist-packages/gevent/threadpool.py", line 182, in _worker task = task_queue.get() File: "/usr/local/lib/python2.7/dist-packages/gevent/_threading.py", line 432, in get self.not_empty.wait() File: "/usr/local/lib/python2.7/dist-packages/gevent/_threading.py", line 148, in wait waiter.acquire() The amount of threads matches the amount shown in gdb, so it looks like this is correct. I can't find any references to "threadpool" in my code, not sure where are these coming from. It's probably worth asking on the gevent mailing list about it, what do you think, Roberto? Thanks! > -- > Roberto De Ioris > http://unbit.it > _______________________________________________ > uWSGI mailing list > [email protected] > http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi >
argparse==1.2.1 BeautifulSoup==3.2.1 boto==2.9.6 chardet==2.0.1 command-not-found==0.2.44 configobj==4.7.2 coverage==3.6 croniter==0.3.3 cssselect==0.8 decorator==3.3.2 distribute==0.6.24 Django==1.5.1 django-annoying==0.7.7 django-appconf==0.6 django-compressor==1.3 django-crispy-forms==1.2.8 django-debug-toolbar==0.9.4 django-extensions==1.1.1 django-geoip==0.3 django-hstore==1.1.1 django-jenkins==0.14.0 django-jsonfield==0.9.4 django-pagination==1.0.7 django-redis==3.2 django-reversion==1.7 django-shared==0.2.9 django-singletons==0.1.6-1 django-sso==0.0.7 django-storages==1.1.8 django-tastypie==0.9.15 django-video==0.0.3 gdata==2.0.17 gevent==1.0rc2 gevent-psycopg2==0.0.3 GnuPGInterface==0.3.2 greenlet==0.4.0 ipdb==0.7 ipython==0.12.1 Jinja2==2.6 language-selector==0.1 logilab-astng==0.24.3 logilab-common==0.59.1 lxml==2.3.2 M2Crypto==0.21.1 Markdown==2.3.1 MarkupSafe==0.15 mercurial==2.0.2 mimeparse==0.1.3 msgpack-python==0.1.10 multitask==0.2.0 oauthlib==0.4.2 p2p-sip==2.2.0 pexpect==2.3 phonenumbers==5.5b1 PIL==1.1.7 progressbar==2.3dev psycopg2==2.4.5 pycrypto==2.4.1 pycurl==7.19.0 pylint==0.28.0 pyquery==1.2.4 pyst==0.4.43 python-apt==0.8.3ubuntu7.1 python-dateutil==2.1 python-debian==0.1.21ubuntu1 pytils==0.2.3 pytz==2013b PyYAML==3.10 pyzmq==13.0.0 redis==2.7.5 requests==1.2.2 requests-oauthlib==0.3.2 rotater.py==0.0.2 selenium==2.32.0 simplegeneric==0.7 six==1.3.0 sorl-thumbnail==11.12 South==0.7.6 ufw==0.31.1-1 unattended-upgrades==0.1 unicodecsv==0.9.4 virtualenv==1.7.1.2 virtualenvwrapper==2.11.1 wsgiref==0.1.2 xlrd==0.9.2 zope.interface==4.0.5
_______________________________________________ uWSGI mailing list [email protected] http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
