Re: [uWSGI] Fully saturating all CPU cores

Malcolm Evershed Tue, 10 Sep 2013 15:37:12 -0700

Thanks for the reply. Discussion below:

On Tue, Sep 10, 2013 at 2:38 AM, Roberto De Ioris <[email protected]> wrote:


> > Hi,
> >
> > I'm investigating using uwsgi to run Python code in the
> > FrameworkBenchmarks
> > project <http://www.techempower.com/benchmarks/> which compares web
> > frameworks, languages, platforms, web servers and more. I tried running
> > another contributor's uwsgi command line, but I can't get uwsgi to fully
> > saturate all CPU cores when under load.
> >
> > uwsgi command line:
> >
> > --master -L --http :8080 --http-keepalive -p 2 -w hello --add-header
> >> "Connection: keep-alive"
>
> in this way you are benchmarking a proxied setup with an http router on
> front managing all of the requests and forwarding them to the workers.
>
> While in term of performance it could be successfull, in terms of core
> usage could be suboptimal (even if core-usage is a bit 'strange' for a
> benchmark as the operating system scheduler chooses the process to give
> cpu to using really complex algorithms).
>

I ran htop and found that the http router process was ~100% (thus, using
most of one core). My guess is that the http router is CPU bound and thus
can't send enough work to the workers, so the worker processes are not
fully utilized. Basically, the http router is the bottleneck. On my system,
this produces about 6,000-7,000 requests/sec, whereas gunicorn can do about
10,000 requests/sec, saturating all cores.


> The "right" command line would be:
>
> --master -L --http-socket :8080 -p 2 -w hello
>
> (keep-alive is useless as this is an in-process non-persistent parser)
>

I tried this (after increasing system somaxconn, using uwsgi -l, and
removing -H 'Connection: keep-alive' from the wrk args) and only 326
requests completed and there were 330 read socket errors and 1788 timeout
socket errors. I'm not sure what's going on, maybe it is a bug in wrk.

But at any rate, my goal is to use HTTP Keep-Alive to get the most
requests/sec, so perhaps --http-socket isn't useful for this benchmark in
the first place.


> If you want to test the http router (something lot of users use on
> production) you may want to use --http-processes 2 (this time keepalives
> work)
>
> With this setup the httprouter too will use 2 processes, but again 'cpu
> cores' usage could be irrelevant.
>

I used `--master -L --http :8080 --http-processes 2 --http-keepalive -p 2
-w hello --add-header ...' and I was able to saturate all CPU cores. The
htop CPU usage was about ~65% for each httprouter process and ~35% for the
worker processes. The result was ~8,500 requests/sec, an improvement, but
still not close to gunicorn. These results seem to suggest that the
original problem was that the httprouter is CPU bound and the bottleneck.

If you really want to map processes to cpu's you may want to use the
> --cpu-affinity, but as you are testing it on a virtual system, the option
> could be useless.
>

So far I haven't seen any data to suggest that this is an affinitization
problem or that affinity could help, so I haven't bothered with
--cpu-affinity. So far virtualization doesn't seem to be an issue since my
physical machine is otherwise idle and has two cores (with hyperthreading).
And the fact that gunicorn and nginx+uwsgi have no saturation issues.


> By the way, are you aware that uWSGI allows fine grained tuning of
> basically everything ? if you publish some benchmark about it, be prepared
> to someone suggesting the most "exotic" configuration options :)


Yes, I have been through some of uWSGI's documentation. My favorite read is
the discussion about --thunder-lock. :-) I could spend hours reading and
trying options, but I wanted to start first with the mailing list to make
sure that I had the right overall approach. I'm open to exotic config
options as long as they make some sense.

To summarize the performance numbers from above:

gunicorn+meinheld: 10,000 requests/sec
uwsgi with 1 http router: 6,000-7,000 requests/sec
uwsgi --http-socket: mostly socket errors
uwsgi with 2 http routers: 8,500 requests/sec
(nginx+uwsgi: I haven't tried this recently, so I don't have a real number,
but I think it is 10,000+)

After doing this research (with your help), my analysis is that the (single
process) uwsgi httprouter becomes CPU bound and becomes the limiting
factor. Thus, to increase the performance, one must distribute the load
amongst more than one httprouter (--http-processes 2), or perhaps use a
different 'router' such as nginx using the uwsgi protocol. What do you
think? Is my thinking/analysis/approach wrong? I'm open to suggestions.

Is there a way to use multiple worker processes without a router?
Basically, is there a way that does the accept()/epoll()/read() from the
network and then in the same process executes the python code? That seems
like that might be the fastest because it would eliminate the dispatch from
the router process to the worker process. I have a feeling that
gunicorn+meinheld might be doing this, but I haven't read the code to
verify.

Thanks.

_______________________________________________
uWSGI mailing list
[email protected]
http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi

Re: [uWSGI] Fully saturating all CPU cores

Reply via email to