The reason this is so rare is that the nature of map/reduce tasks is that they are orthogonal i.e. the word count, batch image recognition, tera sort -- all the things hadoop is famous for are largely orthogonal tasks. Its much more rare (i think) to see people using hadoop to do traffic simulations or solve protein folding problems... Because those tasks require continuous signal integration.
1) First, try to consider rewriting it so that ll communication is replaced by state variables in a reducer, and choose your keys wisely, so that all "communication" between machines is obviated by the fact that a single reducer is receiving all the information relevant for it to do its task. 2) If a small amount of state needs to be preserved or cached in real time two optimize the situation where two machines might dont have to redo the same task (i.e. invoke a web service to get a peice of data, or some other task that needs to be rate limited and not duplicated) then you can use a fast key value store (like you suggested) like the ones provided by basho ( http://basho.com/) or amazon (Dynamo). 3) If you really need alot of message passing, then then you might be better of using an inherently more integrated tool like GridGain... which allows for sophisticated message passing between asynchronously running processes, i.e. http://gridgaintech.wordpress.com/2011/01/26/distributed-actors-in-gridgain/. It seems like there might not be a reliable way to implement a sophisticated message passing architecutre in hadoop, because the system is inherently so dynamic, and is built for rapid streaming reads/writes, which would be stifled by significant communication overhead.
