Sounds right to me.
On Mon, Apr 18, 2016 at 1:29 PM, Kevin Sapper <[email protected]> wrote: > Okay, seems I was a little bit to quick :(. Great analysis btw :) > > Your correct the client cannot recover from disconnected state. The > heartbeat event has been overridden so the client itself will stop sending > heartbeat to the server. But this results in the client ignoring any > heartbeats from the revived server. This is definitely a bug! Instead of > ignoring the heartbeat events we need to stop the client heartbeat timer and > restart it upon reconnect. > > @hintjens please correct me if I'm wrong. > > 2016-04-18 13:13 GMT+02:00 Kevin Sapper <[email protected]>: >> >> Hi Alena, >> >> in the mlm_client.xml there is a state named "defaults" which is inherited >> by many others including "disconnecting". When the client is in >> "disconnecting" state and the server reconnects it will send a heartbeat >> which the client will answer with a connection ping and upon connection pong >> from the server the client will move from "disconnecting" state into >> "connected" state. >> >> //Kevin >> >> 2016-04-18 8:47 GMT+02:00 Alena Chernikava <[email protected]>: >>> >>> Hi, >>> >>> I would like to ask some questions and point out some problems in >>> Malamute broker. >>> >>> I am facing a problem with client reconnect procedure in malamute. >>> Usually a formal description allows me to better understand the problem, >>> that is why I started an investigation with creating a visualization of a >>> state machine for malamute client. I would say it helped me a lot :) Right >>> away I found some "strange behavior"s. I would like to ask some questions to >>> make it more clear for me (may be it was done intentionally) before I will >>> try to "experiment" with fixes. >>> >>> In the attachment you can find my hand-made visualization of the state >>> machine (I was doing it for myself, so it has my thoughts written down). >>> (GREEN - states, RED - events, BLUE - actions). It is not complete, but >>> already helped me to spot some potential and real problems. Here I would >>> describe some issues I found (numbering is the same as on the picture). >>> >>> 1. Re-connection problem. It is actually the main problem I want to >>> discuss. >>> >>> Situation: >>> client sends 3 PINGS and do not receive any PONGS back. After this >>> client will end up in the "disconnected" state. I would say that it is a >>> black hole state, as client cannot normally recover from it (to the >>> "connected" state) or at least move somewhere. >>> >>> Analysis: >>> * We can destroy the client. We will move out of "disconnected" state, >>> but we destroyed the client. :) End of work, nothing to do. Everything is >>> fine >>> * We can move to the "connected" state, if client will receive "PONG" >>> from server or we can move to the "HAVE ERROR" state if client will receive >>> "ERROR" from server. In order to receive from server some response, we need >>> to send something to the server. And here we are: the client do not send >>> anything to the server :( PINGs are disabled in the "mlm_client.xml" from >>> the very beginning. >>> >>> Questions: >>> * Why PING was disabled in "disconnected" state? >>> * What was the basic idea for the "re connect" implementation? >>> >>> Proposal: >>> Enable PINGs. When server receive a PING from "unknown client" it will >>> send "ERROR" back that will trigger "re connection" procedure. But still, I >>> am not sure if client would reconnect correctly, but at least we can give >>> him a chance to do so, because now the client have no chance to reconnect >>> (if server is off for longer period) >>> >>> 2. Take a look on the picture on the right corner. >>> >>> in the mlm_client.xml: >>> >>> <state name = "connecting" inherit = "defaults"> >>> <event name = "OK" next = "connected"> >>> <action name = "signal success" /> >>> <action name = "client is connected" /> >>> </event> >>> This can cause that the following code will be ok (and actually I saw >>> such behavior couple times): >>> int rv = mlm_client_connect(); >>> assert (rv == 0) >>> assert (mlm_client_connected () == false) >>> >>> Proposal: do "signal success" after "client is connected" >>> Question: is there any reason to left the order as it is? >>> >>> 3+4. I didn't understand from the code one point. When client is supposed >>> to start heart beating? >>> I thought, that it should happen after client got "OK" response from the >>> server, but from the state machine I see that in the state "connecting" >>> (while waiting for the response from the server) heart beating starts. Is >>> this a bug or it was done intentionally? >>> >>> 5. It is just a bug, I will fix it later. If mlm_client_connect didn’t >>> work for the first time, the client should remain in «start" state. >>> >>> 6. It is a potential problem. If "PONG" will come before "OK" message >>> from server, the mlm_client_set_producer/consumer/worker will not end >>> correctly and potentially will never do a "return". I propose: return to >>> "confirming" state and wait for "OK" response from server. Do you think it >>> will not break anything? >>> >>> >>> >>> >>> >>> Thank you for reading this, waiting forward for your reply. >>> Alena Chernikava >>> _______________________________________________ >>> zeromq-dev mailing list >>> [email protected] >>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>> >> > > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
