It is expected, because DEALER doesn't have any concept of when a worker in this pattern is "ready": it will round-robin outgoing messages among all connected peers.
Imagine that the state of the world is this: clients 1 and 2 have no outstanding requests, and the dealer socket is primed to forward its next message to worker 2, which will take two seconds to produce a reply. (Worker 1, of course, will only take one second to reply.) Now, both clients send a request at about the same instant. Client 2's request wins the race inside the router socket, and it gets forwarded to worker 2. Immediately after, client 1's request is forwarded to worker 1. Importantly, the dealer socket will send its next message to worker 2. One second passes. Worker 1 sends a reply, causing client 1 to send its next request. This is routed to worker 2, which will still be chewing on client 2's request for another full second! Client 1 needs to wait one second for client 2's request to complete, then two more for worker 2 to process its own request, for the total of three seconds that you observed. ---- As this exercise demonstrates, the simple round-robin behavior baked into REQ, DEALER, and PUSH sockets can create suboptimal schedules when workloads aren't homogeneous. A lot of the time, that's acceptable. If it's not, you can build more sophisticated load balancing algorithms yourself on top of a ROUTER socket, but the exercise does involve a bit of protocol design. On Fri, Apr 10, 2020 at 1:31 PM Jasper Jaspers <[email protected]> wrote: > > I'm testing the REQ->REP pattern with multiple reply workers to test > concurrent behavior. > > Have 3 applications, 2 clients and 1 server, running on same node. > > Each, which is essentially the client from the zmq guide, has a REQ socket > that connects to the server (tcp://127.0.0.1:nnnn). They simply loop sending > messages and waiting for the reply and timing how long it takes to get the > reply. > > The server, which is essentially the mtserver from the zmq guide, has an > external ROUTER that binds to (tcp://127.0.0.1:*). Internally it has a > DEALER with n workers, where each worker is on its own thread and has a REQ > socket that connects locally to DEALER. The ROUTER and DEALER use the > zmq_proxy to map external messages to internal messages. Each REP worker > receives a message and sleeps for some number of seconds to simulate work > time and then sends a reply. > > In my test I have two REP workers configured on the server. I figured one > for each client to get concurrent behavior. Worker1 sleeps for 1 sec and > Worker2 sleeps for 2 seconds. > > Based on this I would expect concurrent behavior and the clients to show that > messages take either 1 or 2 seconds to complete. When I start the first > client I see messages take either 1 or 2 seconds based on which worker > processed the message, which I expect. Then when I start the 2nd client I > see, on both clients, that some messages take 3 seconds to complete. Looking > further, all of the messages that take 3 sec to complete come from Worker2. > Looks like only the first message processed by Worker2 after the second > client starts completes in 2 seconds. My logs show that when Worker2 takes 3 > seconds to complete it's receiving the client's message 1 sec after the > client sent it. This accounts for the additional time but I'm not understand > why this is happening. Is this the correct behavior? > > I also re-ran the test where each Worker sleeps for 2 secs. In this case the > clients showed that all work, from each Worker, completed in 2 secs which I > expected. > _______________________________________________ > zeromq-dev mailing list > [email protected] > https://lists.zeromq.org/mailman/listinfo/zeromq-dev _______________________________________________ zeromq-dev mailing list [email protected] https://lists.zeromq.org/mailman/listinfo/zeromq-dev
