On 11/25/20 12:28 PM, Lux, Jim (US 7140) via Beowulf wrote:
It’s kind of a cluster, but not exactly HPC.

What I have is 3 rPi computers, A,B, and C, and what I’d like to do is keep the desktop and some data directories on all of them synchronized. So if on node A, I add something to A:~/Desktop, it (in short order) winds up in the ~/Desktop directory on the other 2 machines.

It’s easy if I’m doing it from another computer – pdsh and similar do a great job turning a single command into 3 (or N).

But what if it’s on the node itself.  I thought about something like rsync running every second or 10 seconds, or whatever.

But maybe there’s a clever-er way. The network connection isn’t perfect, so a “map it to a shared network drive” approach doesn’t work, and there’s no guarantee that the state is the same on all machines (i.e. one might drop off and reset, and be way behind the other two).  And, the changes might come from any source (i.e. I can’t run all file changes through some single entry point that does a pdsh like “write 3 times”)

You're getting at the meat of what network-attached storage has to solve all the time -- client consistency and coherency. What you're requirements are for that drives what network-attached filesystem you use. In the case of 3 raspberry pi computers, solving the network problem and just using NFS is by far the simplest and cheapest solution. One of the rpis can even be your NFS server if you like, and a 1GbE switch is in the low double digits US currency.

Nevertheless, if you really insist on eventually consistent behavior, know that you're entering into the realm of the unknown. What happens if rpiA is offline, writes to a file, and then rpiB reads from that file some time later? What's expected at that time by that application? What if whole directories are deleted? Or recreated with the same name? How do hardlinks function when rpis disagree over what they should be linked to? The list goes on and on.

If your datasets are truly discrete then don't bother replicating to the other rpis, just replicate to discrete folders on some shared NFS when networking is available again using one-way rsync. This way there's no coherency/consistency issues.

The only last thing I'll say if you really want some kind of clumsy solution to this is to look at ownCloud and available offline directories. You will still have consistency/coherency problems, but at least there will be some other framework managing them for you and a centralized store that manages source of truth for your data.

In industrial control scenarios changes are made at some central location and then distributed throughout the system simultaneously. Even IOT works this way -- the other nodes don't replicate data sent by the others even though the data might flow through them. The data and controls are really solely sent from and stored at some central device (such as the openHAB rpi in my house to my various IOT devices).

Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to