Hi Adam,
Try to se an RX timeout on the SUB socket so that your tight RX loop can
release the CPU back to the OS while waiting for a message to be received..

HTH,
Francesco

Il mer 10 lug 2024, 23:21 Adam Cécile <[email protected]> ha scritto:

> Hello,
>
>
> I'm trying to create an application with one central server gathering
> frames from multiple other process following H264 streams and yielding
> frames to the central server.
>
> However, I'm struggling with high CPU usage on the receiving Zmq part,
> with only a dozen of 25 fps streams, see top below:
>
>      PID USER      PR  NI    VIRT    RES SHR S  %CPU  %MEM     TIME+
> COMMAND
>   210591 usernam   20   0 5605252  63780  21056 S  42.9   0.1 0:06.37
> streams-manager
>   210632 usernam   20   0 7508320 297676 104584 S  17.5   0.5 0:02.57
> stream-cam13039
>   210637 usernam   20   0 7508192 297072 105496 S  16.5   0.5 0:02.65
> stream-cam4004
>   210635 usernam   20   0 7508192 298248 105416 S  16.2   0.5 0:02.56
> stream-cam13041
>   210628 usernam   20   0 7582068 295124 104408 S  15.8   0.4 0:02.60
> stream-cam13035
>   210653 usernam   20   0 7508320 294120 105640 S  15.8   0.4 0:02.55
> stream-cam13072
>   210640 usernam   20   0 7505236 236364 105708 S  10.6   0.4 0:01.91
> stream-cam200
>   210629 usernam   20   0 7505188 236892 105000 S   9.6   0.4 0:01.98
> stream-cam147
>   210642 usernam   20   0 7505112 234796 105128 S   8.9   0.4 0:01.75
> stream-cam204
>   210650 usernam   20   0 7505236 234980 105084 S   8.9   0.4 0:01.59
> stream-cam231
>   210644 usernam   20   0 7578860 234228 104480 S   8.6   0.4 0:01.80
> stream-cam214
>   210646 usernam   20   0 7505112 235932 105384 S   8.6   0.4 0:01.53
> stream-cam215
>   210652 usernam   20   0 7505112 234988 104972 S   8.6   0.4 0:01.62
> stream-cam233
>   161809 nm-open+  20   0   63492  14264  12000 R   7.9   0.0 5:24.89
> openconnect
>   210648 usernam   20   0 7505112 234688 104692 S   6.3   0.4 0:01.68
> stream-cam218
>   210638 usernam   20   0 7503484 214092 105244 S   4.6   0.3 0:01.38
> stream-cam167
>
>
> Sending part is doing fine, and the code actually publishing the frames
> is the following:
>
> In init:
>
> self._zmq_socket = cast(zmq.Socket, self._zmq_context.socket(zmq.PUB))
> self._zmq_socket.setsockopt(zmq.LINGER, 5000)  # ALlow up to 5 seconds
> to flush message before closing
> self._zmq_socket.setsockopt(zmq.HEARTBEAT_IVL, 1000)
> self._zmq_socket.setsockopt(zmq.HEARTBEAT_TIMEOUT, 5000)
> self._zmq_socket.setsockopt(zmq.HEARTBEAT_TTL, 5000)
> self._zmq_socket.setsockopt(zmq.RECONNECT_IVL, 10000)
> self._zmq_socket.connect(self._zmq_url)
>
> For each frame:
>
> height, width, channels = frame.shape
> payload = [int(110).to_bytes(2), height.to_bytes(2), width.to_bytes(2),
> channels.to_bytes(2), bytes() if frame is None else frame.tobytes()]
>
> self._zmq_socket.send_multipart(payload, flags=zmq.NOBLOCK, copy=False,
> track=False)
>
>
> Receiving part, which is causing the issue is the following:
>
> zmq_context = zmq.Context()
> zmq_socket = zmq_context.socket(zmq.SUB)
> zmq_socket.setsockopt_string(zmq.SUBSCRIBE, "")
> zmq_socket.bind(self.socket_url)
>
> while self.running:
>      parts = cast(Tuple[zmq.Frame, zmq.Frame, zmq.Frame, zmq.Frame,
> zmq.Frame], zmq_socket.recv_multipart(copy=False, track=False))
>
>
> Both are communicating using Unix socket (ipc://).
>
> What I already tried:
>
> - Use tcp socket instead of ipc
>
> - Send one single bytes message instead of multipart
>
> - Switch to push/pull instead of pub/sub
>
> Sadly, nothing is really making any change. The only thing that reduce
> "streams-manager" CPU usage close to zero, is to reduce the size of
> message being sent on stream consumer processes:
>
> E.g: Changing: payload = [int(110).to_bytes(2), height.to_bytes(2),
> width.to_bytes(2), channels.to_bytes(2), bytes() if frame is None else
> frame.tobytes()]
>
> To: payload = [int(110).to_bytes(2), height.to_bytes(2),
> width.to_bytes(2), channels.to_bytes(2), bytes() if frame is None else
> frame.tobytes()[:10]]
>
> To keep only 10 bytes of the actual video frame instead of the full one.
>
>
> Am I trying to do something stupid or did I missed something obvious ?
> As libzmq is written in C and this is basically only I/O, I assumed
> receiving raw bytes on one central process would not be a bottleneck...
>
>
> Thanks a lot in advance,
>
> Best regards, Adam.
>
>
> _______________________________________________
> zeromq-dev mailing list
> [email protected]
> https://lists.zeromq.org/mailman/listinfo/zeromq-dev
>
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to