Hi all, I try to debug a weird state between mlm clients and broker. I was not sure how much the pdf is up to date with recent state definitions. So I have hacked a zproto_dot.gsl (as usuall procastinating is better that the real work ;-)), so the state diagrams will be generated automatically by zproto.
You can see the result here https://github.com/vyskocilm/malamute-core/blob/master/src/mlm_client.svg https://github.com/vyskocilm/malamute-core/blob/master/src/mlm_server.svg Bye Michal On Wed, Apr 20, 2016 at 6:57 PM, Pieter Hintjens <[email protected]> wrote: > Sounds right to me. > > On Mon, Apr 18, 2016 at 1:29 PM, Kevin Sapper <[email protected]> wrote: >> Okay, seems I was a little bit to quick :(. Great analysis btw :) >> >> Your correct the client cannot recover from disconnected state. The >> heartbeat event has been overridden so the client itself will stop sending >> heartbeat to the server. But this results in the client ignoring any >> heartbeats from the revived server. This is definitely a bug! Instead of >> ignoring the heartbeat events we need to stop the client heartbeat timer and >> restart it upon reconnect. >> >> @hintjens please correct me if I'm wrong. >> >> 2016-04-18 13:13 GMT+02:00 Kevin Sapper <[email protected]>: >>> >>> Hi Alena, >>> >>> in the mlm_client.xml there is a state named "defaults" which is inherited >>> by many others including "disconnecting". When the client is in >>> "disconnecting" state and the server reconnects it will send a heartbeat >>> which the client will answer with a connection ping and upon connection pong >>> from the server the client will move from "disconnecting" state into >>> "connected" state. >>> >>> //Kevin >>> >>> 2016-04-18 8:47 GMT+02:00 Alena Chernikava <[email protected]>: >>>> >>>> Hi, >>>> >>>> I would like to ask some questions and point out some problems in >>>> Malamute broker. >>>> >>>> I am facing a problem with client reconnect procedure in malamute. >>>> Usually a formal description allows me to better understand the problem, >>>> that is why I started an investigation with creating a visualization of a >>>> state machine for malamute client. I would say it helped me a lot :) Right >>>> away I found some "strange behavior"s. I would like to ask some questions >>>> to >>>> make it more clear for me (may be it was done intentionally) before I will >>>> try to "experiment" with fixes. >>>> >>>> In the attachment you can find my hand-made visualization of the state >>>> machine (I was doing it for myself, so it has my thoughts written down). >>>> (GREEN - states, RED - events, BLUE - actions). It is not complete, but >>>> already helped me to spot some potential and real problems. Here I would >>>> describe some issues I found (numbering is the same as on the picture). >>>> >>>> 1. Re-connection problem. It is actually the main problem I want to >>>> discuss. >>>> >>>> Situation: >>>> client sends 3 PINGS and do not receive any PONGS back. After this >>>> client will end up in the "disconnected" state. I would say that it is a >>>> black hole state, as client cannot normally recover from it (to the >>>> "connected" state) or at least move somewhere. >>>> >>>> Analysis: >>>> * We can destroy the client. We will move out of "disconnected" state, >>>> but we destroyed the client. :) End of work, nothing to do. Everything is >>>> fine >>>> * We can move to the "connected" state, if client will receive "PONG" >>>> from server or we can move to the "HAVE ERROR" state if client will receive >>>> "ERROR" from server. In order to receive from server some response, we need >>>> to send something to the server. And here we are: the client do not send >>>> anything to the server :( PINGs are disabled in the "mlm_client.xml" from >>>> the very beginning. >>>> >>>> Questions: >>>> * Why PING was disabled in "disconnected" state? >>>> * What was the basic idea for the "re connect" implementation? >>>> >>>> Proposal: >>>> Enable PINGs. When server receive a PING from "unknown client" it will >>>> send "ERROR" back that will trigger "re connection" procedure. But still, I >>>> am not sure if client would reconnect correctly, but at least we can give >>>> him a chance to do so, because now the client have no chance to reconnect >>>> (if server is off for longer period) >>>> >>>> 2. Take a look on the picture on the right corner. >>>> >>>> in the mlm_client.xml: >>>> >>>> <state name = "connecting" inherit = "defaults"> >>>> <event name = "OK" next = "connected"> >>>> <action name = "signal success" /> >>>> <action name = "client is connected" /> >>>> </event> >>>> This can cause that the following code will be ok (and actually I saw >>>> such behavior couple times): >>>> int rv = mlm_client_connect(); >>>> assert (rv == 0) >>>> assert (mlm_client_connected () == false) >>>> >>>> Proposal: do "signal success" after "client is connected" >>>> Question: is there any reason to left the order as it is? >>>> >>>> 3+4. I didn't understand from the code one point. When client is supposed >>>> to start heart beating? >>>> I thought, that it should happen after client got "OK" response from the >>>> server, but from the state machine I see that in the state "connecting" >>>> (while waiting for the response from the server) heart beating starts. Is >>>> this a bug or it was done intentionally? >>>> >>>> 5. It is just a bug, I will fix it later. If mlm_client_connect didn’t >>>> work for the first time, the client should remain in «start" state. >>>> >>>> 6. It is a potential problem. If "PONG" will come before "OK" message >>>> from server, the mlm_client_set_producer/consumer/worker will not end >>>> correctly and potentially will never do a "return". I propose: return to >>>> "confirming" state and wait for "OK" response from server. Do you think it >>>> will not break anything? >>>> >>>> >>>> >>>> >>>> >>>> Thank you for reading this, waiting forward for your reply. >>>> Alena Chernikava >>>> _______________________________________________ >>>> zeromq-dev mailing list >>>> [email protected] >>>> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >>>> >>> >> >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev -- best regards Michal Vyskocil _______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
