Re: [Beowulf] Automatically replication of directories among nodes

Ellis H. Wilson III Wed, 25 Nov 2020 09:46:14 -0800

On 11/25/20 12:28 PM, Lux, Jim (US 7140) via Beowulf wrote:

It’s kind of a cluster, but not exactly HPC.
What I have is 3 rPi computers, A,B, and C, and what I’d like to do iskeep the desktop and some data directories on all of them synchronized.So if on node A, I add something to A:~/Desktop, it (in short order)winds up in the ~/Desktop directory on the other 2 machines.
It’s easy if I’m doing it from another computer – pdsh and similar do agreat job turning a single command into 3 (or N).
But what if it’s on the node itself. I thought about something likersync running every second or 10 seconds, or whatever.
But maybe there’s a clever-er way. The network connection isn’t perfect,so a “map it to a shared network drive” approach doesn’t work, andthere’s no guarantee that the state is the same on all machines (i.e.one might drop off and reset, and be way behind the other two). And,the changes might come from any source (i.e. I can’t run all filechanges through some single entry point that does a pdsh like “write 3times”)

You're getting at the meat of what network-attached storage has to solveall the time -- client consistency and coherency. What you'rerequirements are for that drives what network-attached filesystem youuse. In the case of 3 raspberry pi computers, solving the networkproblem and just using NFS is by far the simplest and cheapest solution.One of the rpis can even be your NFS server if you like, and a 1GbEswitch is in the low double digits US currency.

Nevertheless, if you really insist on eventually consistent behavior,know that you're entering into the realm of the unknown. What happensif rpiA is offline, writes to a file, and then rpiB reads from that filesome time later? What's expected at that time by that application?What if whole directories are deleted? Or recreated with the same name?How do hardlinks function when rpis disagree over what they should belinked to? The list goes on and on.

If your datasets are truly discrete then don't bother replicating to theother rpis, just replicate to discrete folders on some shared NFS whennetworking is available again using one-way rsync. This way there's nocoherency/consistency issues.

The only last thing I'll say if you really want some kind of clumsysolution to this is to look at ownCloud and available offlinedirectories. You will still have consistency/coherency problems, but atleast there will be some other framework managing them for you and acentralized store that manages source of truth for your data.

In industrial control scenarios changes are made at some centrallocation and then distributed throughout the system simultaneously.Even IOT works this way -- the other nodes don't replicate data sent bythe others even though the data might flow through them. The data andcontrols are really solely sent from and stored at some central device(such as the openHAB rpi in my house to my various IOT devices).


Best,

ellis
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Automatically replication of directories among nodes

Reply via email to