Hi Brett,

Thank you for your thoughts. 
Looking at some benchmarking code has been very helpful, although I should have 
been more specific about my setup: I'm benchmarking the middleware, not raw ZMQ 
PUB/SUB connections; I'm just trying to use the fact that there's an underlying 
ZMQ queue to optimize my message sending pattern. 
Obviously, your general comments about benchmarking are still applicable and I 
will incorporate them into my setup.

Cheers,
Julius

-----Original Message-----
From: Brett Viren <[email protected]> 
Sent: Friday, 24 April 2020 16:51
To: J.S. Lischeid <[email protected]>
Cc: [email protected]
Subject: Re: [zeromq-dev] Measuring PUB/SUB performance on resource-constrained 
devices

Hi Julius,

Some input from an interested ZeroMQ user:

There are some 10 and 100 GbE latency and throughput results in the wiki.  They 
focus on REQ/REP (lat) and PUSH/PULL (thr).  The benchmark and plotting code is 
in libzmq/perf/.  The "thr" uses PUSH/PULL and its code might be a good basis 
for a PUB/SUB variance.

For PUB/SUB I think the biggest feature to add would be to track dropped 
message rate during the test.  A PUB/SUB test will be very sensitive to whether 
the sender or the receiver is on faster hardware.  Here, an IoT sender to a 
receiver on a workstation is a helpful asymmetry.  Reversing the direction the 
workstation may easily send faster than RPi or similar will receive.

Secondary would be to add something to handle or account for "slow subscriber 
syndrome" (as per zguide).

Another problem I've had in my benchmarks and real apps is making sure that the 
sender stays alive after a stream of sends are done in order to give time for 
local send and remote recv buffers to be flushed and the time measures.  Best 
if the protocol assures this (eg, credit based flow
control) but with PUB/SUB that requires some additional socket patterns.
A simple approach is a "long enough" sleep just before sender termination and 
then do all the benchmark measurements on the receiver.

You may also want to perform benchmark measurements as a function of the number 
of PUBs and/or number of SUBs in the topology.

I suggest plotting throughput and loss rates as a function of the full 
parameter space as exhaustively as you have the patience for.  Searching for 
the max is, imo, not enough of the whole story.  Rather, a developer typically 
has an idea of a range of rates an application may require, or may want to know 
what's "safe" and design to that.  Seeing the big picture is very helpful.


You can take a look at some of the benchmark code I've messed with.
They are encumbered with various layers that may complicate using them directly 
but they may help at least to look at.

I have a cppzmq based PUB/SUB benchmark which is "encumbered" with own "ZIO" 
library layers:

  https://brettviren.github.io/zio/ex-distribution.html
  https://github.com/brettviren/zio/tree/master/test (check-pubsub* files)

And I wrote a CZMQ based benchmark program/library:

  https://github.com/brettviren/zperfmq

It should work with a variety of sockets including PUB/SUB.

But, one note, the CZMQ layer will add some small overhead compared to libzmq.  
It's small compared to what my own apps layers add, even when I'm trying to 
make my layer fast/efficient.  To get a feeling for just how fast libzmq 
playing with the libzmq/perf/ tests was very valuable.

Making a simple PUB/SUB equivalent to the "thr" tests would be very useful and 
I think libzmq would benefit from having it.

And, now I see I missed that there is a libzmq/perf/proxy_thr.cpp already 
there.  It says it tests NODROP using XPUB/XSUB.  It could be a useful starting 
point to include a measure of message drops so PUB/SUB can be tested.

Whatever you end up doing, please report the results you find!

Cheers,
-Brett.

"J.S. Lischeid" <[email protected]> writes:

> Dear ZeroMQ community,
>
> Are there any established throughput benchmarking practices for 
> PUB/SUB on resource-constrained devices that are possibly bottlenecked 
> by CPU/memory consumption instead of network bandwidth?
>
> I'm asking because I'm trying to benchmark an IoT messaging middleware 
> that uses ZMQ PUB/SUB queues under the hood. More specifically, I'm 
> trying to find the maximum theoretical throughput for given hardware 
> configurations (e.g. 256MB/512MB/1GB RAM, different CPU speeds, 
> network interfaces).
>
> These are my thoughts so far:
> - Ideally, you'd want to keep the publisher-side ZMQ-internal message 
> queue filled with a low number of messages throughout the benchmarking 
> interval. There's not enough memory on the devices to keep it filled 
> with a high number of larger messages (KB-MB range) but you'd also 
> want to avoid having an empty queue at any time since you're missing 
> out on send operations you could do in the meantime (for small 
> messages, there also might be batching advantages when having > 1 
> message in the queue).
> - ZMQ does not expose the internal queue fill level.
> - But just spamming a PUB socket with a low high water mark also 
> distorts measurements because it introduces middleware overhead for 
> messages that will not be sent eventually (probably especially 
> important on uniprocessors).
> - My currently favoured approach is performing a (binary) search for 
> the maximum number of messages that can be transferred in a given time 
> frame by evenly spacing out (small batches of) messages and sending 
> the producer thread to sleep in between.
>
> Do you have any thoughts on this or has someone here encountered a similar 
> problem in the past?
>
> Thanks in advance!
>
> Julius
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to