Thanks for the reply. Discussion below: On Tue, Sep 10, 2013 at 2:38 AM, Roberto De Ioris <[email protected]> wrote:
> > Hi, > > > > I'm investigating using uwsgi to run Python code in the > > FrameworkBenchmarks > > project <http://www.techempower.com/benchmarks/> which compares web > > frameworks, languages, platforms, web servers and more. I tried running > > another contributor's uwsgi command line, but I can't get uwsgi to fully > > saturate all CPU cores when under load. > > > > uwsgi command line: > > > > --master -L --http :8080 --http-keepalive -p 2 -w hello --add-header > >> "Connection: keep-alive" > > in this way you are benchmarking a proxied setup with an http router on > front managing all of the requests and forwarding them to the workers. > > While in term of performance it could be successfull, in terms of core > usage could be suboptimal (even if core-usage is a bit 'strange' for a > benchmark as the operating system scheduler chooses the process to give > cpu to using really complex algorithms). > I ran htop and found that the http router process was ~100% (thus, using most of one core). My guess is that the http router is CPU bound and thus can't send enough work to the workers, so the worker processes are not fully utilized. Basically, the http router is the bottleneck. On my system, this produces about 6,000-7,000 requests/sec, whereas gunicorn can do about 10,000 requests/sec, saturating all cores. > The "right" command line would be: > > --master -L --http-socket :8080 -p 2 -w hello > > (keep-alive is useless as this is an in-process non-persistent parser) > I tried this (after increasing system somaxconn, using uwsgi -l, and removing -H 'Connection: keep-alive' from the wrk args) and only 326 requests completed and there were 330 read socket errors and 1788 timeout socket errors. I'm not sure what's going on, maybe it is a bug in wrk. But at any rate, my goal is to use HTTP Keep-Alive to get the most requests/sec, so perhaps --http-socket isn't useful for this benchmark in the first place. > If you want to test the http router (something lot of users use on > production) you may want to use --http-processes 2 (this time keepalives > work) > > With this setup the httprouter too will use 2 processes, but again 'cpu > cores' usage could be irrelevant. > I used `--master -L --http :8080 --http-processes 2 --http-keepalive -p 2 -w hello --add-header ...' and I was able to saturate all CPU cores. The htop CPU usage was about ~65% for each httprouter process and ~35% for the worker processes. The result was ~8,500 requests/sec, an improvement, but still not close to gunicorn. These results seem to suggest that the original problem was that the httprouter is CPU bound and the bottleneck. If you really want to map processes to cpu's you may want to use the > --cpu-affinity, but as you are testing it on a virtual system, the option > could be useless. > So far I haven't seen any data to suggest that this is an affinitization problem or that affinity could help, so I haven't bothered with --cpu-affinity. So far virtualization doesn't seem to be an issue since my physical machine is otherwise idle and has two cores (with hyperthreading). And the fact that gunicorn and nginx+uwsgi have no saturation issues. > By the way, are you aware that uWSGI allows fine grained tuning of > basically everything ? if you publish some benchmark about it, be prepared > to someone suggesting the most "exotic" configuration options :) Yes, I have been through some of uWSGI's documentation. My favorite read is the discussion about --thunder-lock. :-) I could spend hours reading and trying options, but I wanted to start first with the mailing list to make sure that I had the right overall approach. I'm open to exotic config options as long as they make some sense. To summarize the performance numbers from above: gunicorn+meinheld: 10,000 requests/sec uwsgi with 1 http router: 6,000-7,000 requests/sec uwsgi --http-socket: mostly socket errors uwsgi with 2 http routers: 8,500 requests/sec (nginx+uwsgi: I haven't tried this recently, so I don't have a real number, but I think it is 10,000+) After doing this research (with your help), my analysis is that the (single process) uwsgi httprouter becomes CPU bound and becomes the limiting factor. Thus, to increase the performance, one must distribute the load amongst more than one httprouter (--http-processes 2), or perhaps use a different 'router' such as nginx using the uwsgi protocol. What do you think? Is my thinking/analysis/approach wrong? I'm open to suggestions. Is there a way to use multiple worker processes without a router? Basically, is there a way that does the accept()/epoll()/read() from the network and then in the same process executes the python code? That seems like that might be the fastest because it would eliminate the dispatch from the router process to the worker process. I have a feeling that gunicorn+meinheld might be doing this, but I haven't read the code to verify. Thanks.
_______________________________________________ uWSGI mailing list [email protected] http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
