Sorry — meant to get back to you sooner, but it’s been a crazy week. You don’t say what version you’re running, but there have been some changes in that area not that long ago — check these out and see if they help:
https://github.com/zeromq/libzmq/pull/3831 https://github.com/zeromq/libzmq/pull/3960 https://github.com/zeromq/libzmq/pull/4053 Good luck. Bill > On May 20, 2021, at 10:26 AM, James Harvey <[email protected]> > wrote: > > Hi, > > I will try and simplify my previous long email. > > If a stream gets into a protocol error state (e.g tcp SUB connect to REQ) > > Should the information (connection is terminated) be passed somehow back to > the parent socket so if connect() is called again it attempts to connect > rather than a no-op. > > OR > > Should we add a protocol error event to socket monitor so the calling process > can handle it by calling disconnect/connect > > Just want some clarification so I work on the correct code. > > Thanks > > James > > On Thu, May 13, 2021 at 4:48 PM James Harvey <[email protected] > <mailto:[email protected]>> wrote: > Hi, > > I have a rare/random bug that causes my ZMQ_SUB socket to fail for a certain > endpoint with no way to track/notify. Yes it's because a SUB connects to a > REQ socket but once you start to use zeromq for lots of transient systems in > a large company this kind of thing will happen occasionally. > > The process happens like this: > > - ZMQ_PUB binds on 1.2.3.4:44444 <http://1.2.3.4:44444/> (ephemeral) > - ZMQ_SUB connects to 1.2.3.4:44444 <http://1.2.3.4:44444/> (data flows) > - ZMQ_PUB goes down > - Unrelated process (ZMQ_REQ) comes up and grabs the same 1.2.3.4:44444 > <http://1.2.3.4:44444/> as its ephemeral > - ZMQ_SUB has not yet been told to disconnect so it reconnects to the > ZMQ_REQ > - protocol error happens and the connection is terminated in the > session/engine > - Now a good ZMQ_PUB comes up and binds on 1.2.3.4:44444 > <http://1.2.3.4:44444/> > - ZMQ_SUB gets new instruction to connect() > - connect() just returns noop. > - The socket_base thinks it still has a valid endpoint and SUB only > connects once to each endpoint. > - At this point there are no errors and no data flowing. > > My question is, should the protocol_error in the session propagate up to > remove the endpoint from the socket? > > If yes I can look at adding that, if no do you have any suggestions? > > Thanks for your time > > James > > Some links to the code: > > If socket is SUB and the endpoint is present dont connect. > https://github.com/zeromq/libzmq/blob/master/src/socket_base.cpp#L901 > <https://github.com/zeromq/libzmq/blob/master/src/socket_base.cpp#L901> > > terminate with no reconnect on protocol_error > https://github.com/zeromq/libzmq/blob/master/src/session_base.cpp#L486 > <https://github.com/zeromq/libzmq/blob/master/src/session_base.cpp#L486> > _______________________________________________ > zeromq-dev mailing list > [email protected] > https://lists.zeromq.org/mailman/listinfo/zeromq-dev
_______________________________________________ zeromq-dev mailing list [email protected] https://lists.zeromq.org/mailman/listinfo/zeromq-dev
