Hi Uri, as it is with all distributed applications, debugging is problematic. IMO printf does work best if you're chasing a bug. But if you like to reproduce a distributed program you need a causal relationship of all events. The only solution I know off, that does offer this, are Mattern's Vector Clocks.
During a student project last semester we implemented a dynamic vector clock for Zyre to order log messages according to their causal relationship. Once we assembled all peer logs we were able to generate a global log and a space time diagram to see the event flow including the log messages. It handles joining peers well but for leaving peers you'll need global consensus through Paxos for example. The drawback of vector clocks is of course that it does not scale. The more peers join the larger the vector gets. You can have a look a the project here https://zenon.cs.hs-rm.de/causality-logger/zlogger/. //Kevin Am 04.10.2016 20:28 schrieb "Uri Moszkowicz" <[email protected]>: > Hi Per, > Thanks for the links. I should have mentioned that the compiler is the > tool we're developing, not one we're using. It is also a non-traditional > compiler. It doesn't take a program as input or produce an executable as > output. We're trying to make it look more like traditional compilers in > that it can be compiled in pieces and assembled at the end. > > Uri > > On Tue, Oct 4, 2016 at 1:15 PM, Per Sandberg <[email protected]> > wrote: > >> Sounds like you are reinventing. >> >> the old distcc https://github.com/distcc/distcc >> >> only for c-family only code. >> >> or >> >> the modern gprbuild https://github.com/AdaCore/gprbuild. >> >> For "almost any" compiled language. >> >> /Per >> >> >> Den 2016-10-04 kl. 20:07, skrev Uri Moszkowicz: >> >> Hi, >> My team is looking at ZeroMQ for taking a big monolithic non-traditional >> compiler and distributing it. The biggest problem that comes to mind is >> debug, how do you take a piece of your program and reproduce it in a >> debugger after a crash? >> >> It seems to me that we need to checkpoint and order/log messages for this >> to work but that seems very difficult to implement. There's plenty on the >> topic in distributed systems literature but not much written about it in >> practice. Did I miss it in the manual? What have you all done to solve this >> problem? >> >> Thanks, >> Uri >> >> >> _______________________________________________ >> zeromq-dev mailing >> [email protected]http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> >> >> >> _______________________________________________ >> zeromq-dev mailing list >> [email protected] >> http://lists.zeromq.org/mailman/listinfo/zeromq-dev >> > > > _______________________________________________ > zeromq-dev mailing list > [email protected] > http://lists.zeromq.org/mailman/listinfo/zeromq-dev >
_______________________________________________ zeromq-dev mailing list [email protected] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
