Thanks George. Much appreciated. On Thu, Jul 16, 2020 at 11:21 PM George Neuner <[email protected]> wrote:
> > Hi David, > > On 7/16/2020 11:44 AM, David Storrs wrote: > > On Thu, Jul 16, 2020 at 10:09 AM George Neuner <[email protected]> > wrote: > >> >> The problem seems under-specified. Can you say more about the real >> purpose? >> > > Basic version: It's a peer-to-peer encrypted swarmed file sharing system > that presents like Dropbox on the front end (i.e. "make a change to the > filesystem on peer A and peers B-Z will replicate that change") and works > something like Bittorrent on the back end in that files are sent in chunks > but it offers functionality that Bittorrent does not, such as encrypted > transfer, WoT authentication, etc. > > > Interesting. So I'm guessing your problem is to (compactly) represent the > state of the shared space. > > Do you plan on having index servers, or are you aiming for a fully > distributed solution? And, if distributed, do you want each node to > maintain its own state picture of the shared space, or were you thinking > that nodes could just snoop admin broadcasts looking for mention of data > they don't currently have? [Your question about how to pair / collapse > messages suggests you might be considering a snoopy solution.] > > Asking because keeping a state picture has scalability issues, a snoopy > solution has complexity issues, and (depending on latency) both have issues > with performing unnecessary work. In any event, I have some suggestions. > > > > Snoopy is the more interesting case. You start with a queue of file > operations to be done as gleaned from the admin messages - mkdir, rmdir, > fetch a file, delete a file, etc. - in whatever order the messages were > received. > > Separately, you maintain a (hash table) mapping from pathnames to a list > of queue nodes that operate on that object. The map should use weak > references so that nodes can safely be removed from the queue and discarded > without also needing to update the map. If queue processing gets to some > operation first, any map reference to it will dissolve (be replaced by > #f). > > When a message is received, you lookup the pathname in the map, and if a > complementary operation is found in the queue, you remove and discard it. > [You can also remove references in the map or just let them dissolve > depending on your handling.] Then simply discard the message. > > Otherwise you queue whatever operation the message indicates and add a > reference to the queue node under the object's pathname in the map. > > Extra complexity comes in having to notice that map entries (pathnames) > have no operations left in the queue. Weak references don't just disappear > - they are changed to #f when the referenced object is no longer reachable > - however AFAICT there is no hashtable variant that permits weak reference > values, so you have to use weak-boxes and those continue to exist even if > the objects they reference are gone. Useless map entries will need to be > identified and removed somehow. > > > Modeling the filesystem can be done rather simply with a trie in which > folders are represented by mutable hash tables and files by structures. > You can combine this with the operation queue above, but in this case > lookups can be done in the trie and queue references kept in the trie > nodes. And the trie provides a snapshot of the current state which may be > useful for other purposes. > > > The trick in either case is processing latency: you don't want to wait too > long, but if you really want to avoid unnecessary work you need to delay > performing file operations long enough that complementary messages are > likely to be received. > > > > What if messages are lost permanently, e.g., due to hardware crash? >> > >> What it you receive a create but a corresponding delete or update is >> lost - then your information / picture of the file system state is wrong. >> >> What if you receive a file delete without a corresponding create? In the >> absence of other information, can you even assume there *was* a create? >> If these messages are sent in response to user actions, can they ever be >> sent mistakenly? >> >> > The ultimate answer to these questions is "If things get out of sync in a > way that the system cannot resolve, it will be flagged for a human to > resolve." There are things we do that mitigate them -- for example, a > write-ahead log for messages received from peers -- but we acknowledge that > we cannot resolve 100% of situations automatically. Neither can any other > file replication service. (Dropbox, Box.com, etc) > > Also relevantly, differences are reconciled across multiple peers. If > there's 5 peers in your replication set and the other 4 agree that there > should be a file at path P but you don't have one then it's safe to assume > that you missed a File-Create message. And yes, that comes with issues of > its own (Q: What if it was deleted on your machine and none of the others > got your File-Delete because you crashed before sending it? A: Worst case, > the file gets recreated and the user deletes it again. Also, move files to > a Trash folder in response to a File-Delete, don't actually delete them for > a certain period of time) but again we fall back to human resolution. > >> >> > Are you are considering eavesdropping on multicast? That may work fine on > a LAN ... but for wide area the variability in UDP complicates maintaining > a consistent global state picture. For a real replication system I think > you really will want a reliable, causal delivery mechanism. And if large > scalability is an issue you will want an adaptive topology that limits the > number of connections. > > > YMMV. Hope this sparks a good idea. > George > > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CAE8gKocamy%3DdV7X6EBPP0F68OmP16X4UQFZnoOU3-HkRGs-s%3Dw%40mail.gmail.com.

