Hello,
I'm trying to create an application with one central server gathering
frames from multiple other process following H264 streams and yielding
frames to the central server.
However, I'm struggling with high CPU usage on the receiving Zmq part,
with only a dozen of 25 fps streams, see top below:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
210591 usernam 20 0 5605252 63780 21056 S 42.9 0.1 0:06.37
streams-manager
210632 usernam 20 0 7508320 297676 104584 S 17.5 0.5 0:02.57
stream-cam13039
210637 usernam 20 0 7508192 297072 105496 S 16.5 0.5 0:02.65
stream-cam4004
210635 usernam 20 0 7508192 298248 105416 S 16.2 0.5 0:02.56
stream-cam13041
210628 usernam 20 0 7582068 295124 104408 S 15.8 0.4 0:02.60
stream-cam13035
210653 usernam 20 0 7508320 294120 105640 S 15.8 0.4 0:02.55
stream-cam13072
210640 usernam 20 0 7505236 236364 105708 S 10.6 0.4 0:01.91
stream-cam200
210629 usernam 20 0 7505188 236892 105000 S 9.6 0.4 0:01.98
stream-cam147
210642 usernam 20 0 7505112 234796 105128 S 8.9 0.4 0:01.75
stream-cam204
210650 usernam 20 0 7505236 234980 105084 S 8.9 0.4 0:01.59
stream-cam231
210644 usernam 20 0 7578860 234228 104480 S 8.6 0.4 0:01.80
stream-cam214
210646 usernam 20 0 7505112 235932 105384 S 8.6 0.4 0:01.53
stream-cam215
210652 usernam 20 0 7505112 234988 104972 S 8.6 0.4 0:01.62
stream-cam233
161809 nm-open+ 20 0 63492 14264 12000 R 7.9 0.0 5:24.89
openconnect
210648 usernam 20 0 7505112 234688 104692 S 6.3 0.4 0:01.68
stream-cam218
210638 usernam 20 0 7503484 214092 105244 S 4.6 0.3 0:01.38
stream-cam167
Sending part is doing fine, and the code actually publishing the frames
is the following:
In init:
self._zmq_socket = cast(zmq.Socket, self._zmq_context.socket(zmq.PUB))
self._zmq_socket.setsockopt(zmq.LINGER, 5000) # ALlow up to 5 seconds
to flush message before closing
self._zmq_socket.setsockopt(zmq.HEARTBEAT_IVL, 1000)
self._zmq_socket.setsockopt(zmq.HEARTBEAT_TIMEOUT, 5000)
self._zmq_socket.setsockopt(zmq.HEARTBEAT_TTL, 5000)
self._zmq_socket.setsockopt(zmq.RECONNECT_IVL, 10000)
self._zmq_socket.connect(self._zmq_url)
For each frame:
height, width, channels = frame.shape
payload = [int(110).to_bytes(2), height.to_bytes(2), width.to_bytes(2),
channels.to_bytes(2), bytes() if frame is None else frame.tobytes()]
self._zmq_socket.send_multipart(payload, flags=zmq.NOBLOCK, copy=False,
track=False)
Receiving part, which is causing the issue is the following:
zmq_context = zmq.Context()
zmq_socket = zmq_context.socket(zmq.SUB)
zmq_socket.setsockopt_string(zmq.SUBSCRIBE, "")
zmq_socket.bind(self.socket_url)
while self.running:
parts = cast(Tuple[zmq.Frame, zmq.Frame, zmq.Frame, zmq.Frame,
zmq.Frame], zmq_socket.recv_multipart(copy=False, track=False))
Both are communicating using Unix socket (ipc://).
What I already tried:
- Use tcp socket instead of ipc
- Send one single bytes message instead of multipart
- Switch to push/pull instead of pub/sub
Sadly, nothing is really making any change. The only thing that reduce
"streams-manager" CPU usage close to zero, is to reduce the size of
message being sent on stream consumer processes:
E.g: Changing: payload = [int(110).to_bytes(2), height.to_bytes(2),
width.to_bytes(2), channels.to_bytes(2), bytes() if frame is None else
frame.tobytes()]
To: payload = [int(110).to_bytes(2), height.to_bytes(2),
width.to_bytes(2), channels.to_bytes(2), bytes() if frame is None else
frame.tobytes()[:10]]
To keep only 10 bytes of the actual video frame instead of the full one.
Am I trying to do something stupid or did I missed something obvious ?
As libzmq is written in C and this is basically only I/O, I assumed
receiving raw bytes on one central process would not be a bottleneck...
Thanks a lot in advance,
Best regards, Adam.
_______________________________________________
zeromq-dev mailing list
[email protected]
https://lists.zeromq.org/mailman/listinfo/zeromq-dev