Hi Doron, Il giorno lun 12 ago 2019 alle ore 12:13 Doron Somech <[email protected]> ha scritto: > > It is not waiting to batch up. > The background IO thread dequeue messages from internal queue of messages > waiting to be sent. > Zeromq dequeue messages until that queue is empty or the buffer is full, so > not waiting for anything.
Right ok, I didn't meant to say that it was literally waiting doing a sleep() but my naive reasoning would be that the ZMQ background IO thread should always have its queue full of messages to send over TCP so that message batching up to 8KB should be happening all the time... but then my question (why I don't get a flat curve up to 8kB message sizes) applies :) I did some further investigation and I found that, in the 10Gbps environment setup I benchmarked (http://zeromq.org/results:10gbe-tests-v432) the performances are bounded by the remote_thr side, when sending 64B frames. Here is what "perf top" reports on the 2 worker threads of the remote_thr app: main remote_thr thread: 23,33% libzmq.so.5.2.3 [.] zmq::ypipe_t<zmq::msg_t, 256>::flush 22,86% libc-2.17.so [.] malloc 20,00% libc-2.17.so [.] _int_malloc 11,51% libzmq.so.5.2.3 [.] zmq::pipe_t::write 4,35% libzmq.so.5.2.3 [.] zmq::ypipe_t<zmq::msg_t, 256>::write 2,38% libzmq.so.5.2.3 [.] zmq::socket_base_t::send 1,81% libzmq.so.5.2.3 [.] zmq::lb_t::sendpipe 1,36% libzmq.so.5.2.3 [.] zmq::msg_t::init_size 1,33% libzmq.so.5.2.3 [.] zmq::pipe_t::flush zmq bg IO remote_thr thread: 38,35% libc-2.17.so [.] _int_free 13,61% libzmq.so.5.2.3 [.] zmq::pipe_t::read 9,24% libc-2.17.so [.] __memcpy_ssse3_back 8,99% libzmq.so.5.2.3 [.] zmq::msg_t::size 3,22% libzmq.so.5.2.3 [.] zmq::encoder_base_t<zmq::v2_encoder_t>::encode 2,34% [kernel] [k] sysret_check 2,20% libzmq.so.5.2.3 [.] zmq::ypipe_t<zmq::msg_t, 256>::check_read 2,15% libzmq.so.5.2.3 [.] zmq::ypipe_t<zmq::msg_t, 256>::read 1,32% libc-2.17.so [.] free So my feeling is that even if the message batching is happening, right now it's the zmq_msg_init_size() call that is limiting the performances actually. This is the same problem I experienced in a more complex contest and that I described in this email thread: https://lists.zeromq.org/pipermail/zeromq-dev/2019-July/033012.html > If we would support the zerocopy we can make the buffer larger than 8kb, and > when the buffer is full we would use the zerocopy flag. Right. However before getting benefits from the new kernel zerocopy flag I think we should somehow allow the libzmq users to use some kind of memory pooling, otherwise my feeling is that the performance benefit would be neglible... what do you think? Thanks, Francesco _______________________________________________ zeromq-dev mailing list [email protected] https://lists.zeromq.org/mailman/listinfo/zeromq-dev
