> > > uwsgi-1.9.15 - Wed Sep 11 00:32:08 2013 - req: 63709 - lq: 0 - tx: 7.0M > node: machinename - cwd: /home/ubuntu/FrameworkBenchmarks/uwsgi - uid: > 1000 - gid: 1000 - masterpid: 6948 > WID % PID REQ EXC SIG STATUS AVG RSS > VSZ TX RunT > 2 51.7 6950 32947 0 0 idle 0ms 0 > 0 3.0M 1252 > 1 48.3 6949 30762 0 0 idle 0ms 0 > 0 3.0M 1157 > >> > So far I haven't seen any data to suggest that this is an >> affinitization >> > problem or that affinity could help, so I haven't bothered with >> > --cpu-affinity. So far virtualization doesn't seem to be an issue >> since my >> > physical machine is otherwise idle and has two cores (with >> > hyperthreading). >> >> on virtualized systems cpu-affinity simply does not work for the way >> cpu's >> are abstracted by the hypervisor. Even if your kernel will show the >> right >> distribution, internally you do not know which cpu is effectively used. >> >> But this is not your problem. I have run some test with a concurrency of >> 90 (so no need to tune the listen queue), and --http-socket was 1-2% >> faster, while httprouter + uwsgi was 3-4% slower (as expected as you >> have >> the ipc overhead, something you will always have in production >> environments) >> >> > After doing this research (with your help), my analysis is that the >> > (single >> > process) uwsgi httprouter becomes CPU bound and becomes the limiting >> > factor. >> >> (Always supposing you are using a 1.9.x version) >> >> the httprouter became CPU bound only on higher level of concurrency >> (unless you are using a pre 1.9 version where there are blocking parts) >> >> workers are more heavy in term of "things to do", the fact they are low >> in >> cpu usage, suggests a communication problem (again it could be the >> listen >> queue). The httprouter (as nginx) does not have the need to tune the >> listen queue as they constantly accept() and wait again, reducing the >> need >> of a queue. Workers (instead) have the heavy part after the accept() and >> connections coming to it while in the "heavy part" are enqueued (and >> saturating a 100 listen queue with 256 concurrent connections and 2 >> workers is pretty easy, expecially because the --http-socket expect a 4 >> seconds timeout on protocol traffic) > > Your theory makes sense, but so far I don't think I've seen any data > suggesting that's what is going on. I'm open to ideas. > >> > Thus, to increase the performance, one must distribute the load >> > amongst more than one httprouter (--http-processes 2), or perhaps use >> a >> > different 'router' such as nginx using the uwsgi protocol. What do you >> > think? Is my thinking/analysis/approach wrong? I'm open to >> suggestions. >> >> the httprouter passes requests to uWSGI workers via the uwsgi protocol. >> In >> terms of performance it should map 1:1 with nginx (it is only because it >> is way simpler than nginx, the parser of the latter is better for sure) > > I tried nginx+uwsgi and I got ~12,300 requests/sec, the best result > I've gotten so far. The uwsgi command line: > > --master -L -l 5000 --socket /tmp/uwsgi.sock --chmod-socket=666 -p 2 > -w hello --pidfile /tmp/uwsgi.pid > > The nginx.conf is here: https://gist.github.com/MalcolmEvershed/6520477 > > Isn't it odd that nginx+uwsgi is the best performing combination, > beating gunicorn+meinheld and uwsgi httprouter? I'm really not sure > what to make of this. I must be doing something wrong? > >> > Is there a way to use multiple worker processes without a router? >> > Basically, is there a way that does the accept()/epoll()/read() from >> the >> > network and then in the same process executes the python code? That >> seems >> > like that might be the fastest because it would eliminate the dispatch >> > from >> > the router process to the worker process. I have a feeling that >> > gunicorn+meinheld might be doing this, but I haven't read the code to >> > verify. >> >> i do not follow you here, it is the standard way uWSGI works. Even with >> the httprouter the backend workers share the socket. It is the reason >> why >> the --thunder-lock is needed in high-load scenarios. > > Maybe I'm misunderstanding. I thought that when an httprouter is used > it works like this: >
It took me a bit to fully understand what is going on. Finally i decided to invest a bit of time on 'wrk' to check how it works. Well, while i am not a big fan of "hello world" benchmarks, the one you made resulted in really interesting (and funny) numbers. Regarding --http-socket: add --add-header 'Connection: close' and you should be able to complete the test (it seems wrk does not manage well implicit non-keepalive connections). Results will be pretty near the --http one. So nothing funny here. Regarding meinheld: for this kind of test keep-alive definitely helps, i would never have bet a cent on it but effectively if you add -H 'Connection: close' to wrk, uWSGI will start winning again (10% more requests compared to mainheld and upto 40% for plain gunicorn). [Note: please do not blame gunicorn, hello world tests tend to favour c implementations, things heavily change with real applications] The funny part: i suppose you are using UNIX sockets for nginx. Again this test is based on micro optimizations. Let's sum (the numbers are relative to my machine) uwsgi http router + uwsgi tcp -> 66.000-67.000 uwsgi http-socket (no proxy) -> 67.000-68.000 uwsgi http router + uwsgi unix -> 110.000 nginx + uwsgi tcp -> 83.000-84.000 nginx + uwsgi unix -> 145.000-160.000 (!!!) nginx + uwsgi --http-socket tcp -> 69.000-71.000 nginx + uwsgi --http-socket unix -> 108.000-110.000 gunicorn+meinheld (no proxy, keepalive) -> 125.000-127.000 gunicorn+meinheld (no proxy, connection close) -> 48.000-55.000 gunicorn (no proxy) -> 22.000->27.000 why nginx + uwsgi wins ? nginx has a better keepalive parser than uwsgi nginx and meinheld http parsers are the same uwsgi protocol (under nginx) performs a lot better than the http one the uWSGI WSGI plugin is way faster than the gunicorn one (but just because of the 'hello world' test, real-world test with more impact in the python side have different results) unix socket always win as micro-optimization so even with a proxy in the middle, nginx makes a difference with keep-alive connections and the usage of the uwsgi protocol combined with the WSGI plugin results in better numbers. Side note: adding --thunder-lock to uWSGI gives a boost of 5.000 to 8.000 requests, there are other tuning available but you will raise no more than a couple hundreds requests. Again: this values are for an hello world, where the "python part" is less than 10% of the whole uWSGI request time, so do not give them too much emphasis. I think the same situation will apply to Ruby and Perl plugins too. -- Roberto De Ioris http://unbit.it _______________________________________________ uWSGI mailing list [email protected] http://lists.unbit.it/cgi-bin/mailman/listinfo/uwsgi
