Hi, just to update on this topic, in case it's useful to others: it turned out that the bottleneck was only in minor part related to the setting of the inbound_poll_rate itself: the problem was more on the fact that I was calling zmq_poll() before each zmq_msg_recv() (to implement a timeout on the recv()). The way I fixed the problem was to implement a slightly smarter polling logic: I try to receive with ZMQ_DONTWAIT flag. Then if 0 messages are received, next time a poll() operation with my desidered timeout will be performed before attempting the recv(). If 1 message is received instead then the next time the same recv() (always with ZMQ_DONTWAIT flag) will be repeated.
In this way I found that I could reduce greatly the number of poll operations (and thus increase the performances when a lot of messages are incoming) but also keep my desired behaviour of a timeout-enabled recv(). HTH, Francesco 2017-10-24 19:48 GMT+02:00 Francesco <[email protected]>: > Hi Luca, > Ok, that seems like a good explanation :) > I will investigate if changing that value allows me to decrease poll rate > and improves performances and if so I will create a PR. > > thanks, > F > > > 2017-10-24 16:36 GMT+02:00 Luca Boccassi <[email protected]>: > >> On Tue, 2017-10-24 at 15:43 +0200, Francesco wrote: >> > Hi all, >> > I'm running a process that creates a ZMQ_SUB socket and is subscribed >> > to >> > several publishers (over TCP transport). >> > >> > Now I measured that this process saturates with CPU at 100% and >> > slows-down >> > the publisher (I'm using ZMQ_XPUB_NODROP=1) when subscribed to more >> > than >> > 600kPPS/1.6Gbps of traffic. >> > >> > The "strange" thing is that when it's subscribed to much lower >> > traffic >> > (e.g,. in my case to around 4 kPPS / 140 Mbps) the CPU still stays at >> > 100%. >> > If I strace the process I find out that: >> > >> > # strace -cp 48520 >> > >> > strace: Process 48520 attached >> > ^Cstrace: Process 48520 detached >> > % time seconds usecs/call calls errors syscall >> > ------ ----------- ----------- --------- --------- ---------------- >> > 93.29 0.634666 2 327545 poll >> > 6.70 0.045613 <045%20613> 2 23735 read >> > 0.01 0.000051 4 12 write >> > 0.00 0.000001 1 1 restart_syscall >> > ------ ----------- ----------- --------- --------- ---------------- >> > 100.00 0.680331 351293 total >> > >> > the 93% of the time is spent inside poll(), which happens to be >> > called with >> > this stack trace: >> > >> > #0 poll () at ../sysdeps/unix/syscall-template.S:84 >> > #1 0x00007f98da959a1a in zmq::signaler_t::wait(int) () >> > #2 0x00007f98da937f75 in zmq::mailbox_t::recv(zmq::command_t*, int) >> > () >> > #3 0x00007f98da95a3a7 in zmq::socket_base_t::process_commands(int, >> > bool) >> > [clone .constprop.148] () >> > #4 0x00007f98da95c6ca in zmq::socket_base_t::recv(zmq::msg_t*, int) >> > () >> > #5 0x00007f98da97e8c0 in zmq_msg_recv () >> > >> > Maybe I'm missing something but by looking at the code it looks to me >> > that >> > this is happening because of the config setting >> > 'inbound_poll_rate=100'; >> > that is, every 100 packets received zmq_msg_recv() will do an extra >> > poll. >> > >> > Now my question is: is there any reason to have this >> > inbound_poll_rate setting hardcoded and not configurable (e.g., via >> > context >> > option) ? >> >> It might be as simple as nobody has needed it beforehand. I guess one >> issue could be that by changing those values it's extremely easy to >> shoot oneself in the foot. >> But if you need it feel free to send a PR to implement the option. >> >> -- >> Kind regards, >> Luca Boccassi >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> https://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> >
_______________________________________________ zeromq-dev mailing list [email protected] https://lists.zeromq.org/mailman/listinfo/zeromq-dev
