Dear all, Now I have difficulty using multicast using Sub/Pub pattern. First time, I've posted this issue in libzmq git. But as there has been any response so far, I request help here.
Issue description In using PGM(pub/sub pattern), when the data size has more than some extent(from hundreds of kb and mb), the following problems happen. 1. From some point, messages are not received anymore in "zmq_msg_recv" though the packets of messages are monitored to keep being received through Wireshark. 2. Each message is composed of 2 parts. The first part has fixed size of 2 bytes. The last part has variable size. Sometimes a part of a message is lost and only a part of a message is received. The receiver side of my program makes output in normal case like this: 49: first message part of size 2 received, second message part of size 196604 received 50: first message part of size 2 received, second message part of size 199747 received 51: first message part of size 2 received, second message part of size 110503 received But In some abnormal case, 49: first message part of size 2 received, second message part of size 196604 received 50: first message part of size 2 received, second message part of size 2 received 51: first message part of size 110503 received, second message part of size 2 received Environment 4 PCs with Intel CPUs, each has differenct specs(CPU, RAM, GPU) Each PC is connected to switch hub with 1Gbps with Category 6 cable. - libzmq version (commit hash if unreleased): 4.25, cppzmq - OS: Linux Ubuntu 18.04 Minimal test code / Steps to reproduce the issue I append the test project: https://github.com/zeromq/libzmq/files/4008149/TestGroupMessaging.zip You will see the problems following my instructions below. Basically the test program runs in two different mode, sender and receiver mode. In both mode, we give the time argument value (in nanoseconds) to control sending/receiving rate. For sender mode, the arguments given to the program execution is the following. "TestBasicPublishGroupMessaging(program name) y(indicating sender mode), 100(total sending count), 1000000(sending rate in nanoseconds)" For receiver mode, the arguments given to the program execution is the following. "TestBasicPublishGroupMessaging(program name) n(indicating receiver mode), 1000(receiving rate in nanoseconds)" I have tweaked the relevant setting values such as ZMQ_RCV/SNDHWM, ZMQ_RATE, ZMQ_RCV/SNDBUF into the maximum values. What's the actual result? (include assertion message & call stack if applicable) For 100kb ~ 300kb message size, when receiver rate is per 1,000 nanoseconds and the sender rate is per 1,000,000 nanoseconds, the first problem happens so frequently. For 100kb ~ 300kb message size, when receiver rate is per 1,000 nanoseconds and the sender rate is per 10,000,000 nanoseconds, the first problem happens so rarely. For 1MB ~ 3MB, when receiver rate is per 1,000 nanoseconds and the sender rate is per 1,000,000 or 10,000,000 nanoseconds, the first problem happens so frequently. I debugged zeromq code and it was seen that "pgm_recvmsgv" in "receive" method of "pgm_socket_t" does not get all packets of a message normally even if all packets had been monitored to be received normally in Wireshark. For the second problem, it happens in irregular and rare pattern. So it is difficult to reproduce this problem. But when I run in debug mode with the above message sizes, I found that sparsely the data receipt is done in burst pattern: I added a breakpoint on send method("networkDistribution.getPublishService().tryPublish()" in "TestBasicPublishGroupMessaging.cpp") in sender side and repeated to resume the code with the breakpoint in sender side and watch what happens in receiver side step by step. Sometimes, after a message is sent in a sender side, "receive" method of "pgm_socket_t" does not get all packets with 1428 byte unit size(only some parts received) composing a message in receiver side. Not yet received parts of the previous message are retrieved later together with the following messages in "receive" method of "pgm_socket_t". The second problem is frequently seen in this case. How to build the project 1. set the ZMQ_BASE_DIR, ZMQ_USED_VER variable in CMakeLists.txt depending on your environment. 2. move to the build/Debug or Release directory. 3. execute "cmake -DCMAKE_BUILD_TYPE=Debug or Release ../.." 4. the binary files are produced in the bin/Debug or Release
_______________________________________________ zeromq-dev mailing list [email protected] https://lists.zeromq.org/mailman/listinfo/zeromq-dev
