Re: [racket-users] Best data structure for ordered data set with insertion and reordering?

David Storrs Fri, 17 Jul 2020 08:14:25 -0700

Thanks George.  Much appreciated.

On Thu, Jul 16, 2020 at 11:21 PM George Neuner <[email protected]> wrote:


>
> Hi David,
>
> On 7/16/2020 11:44 AM, David Storrs wrote:
>
> On Thu, Jul 16, 2020 at 10:09 AM George Neuner <[email protected]>
> wrote:
>
>>
>> The problem seems under-specified.  Can you say more about the real
>> purpose?
>>
>
> Basic version:  It's a peer-to-peer encrypted swarmed file sharing system
> that presents like Dropbox on the front end (i.e. "make a change to the
> filesystem on peer A and peers B-Z will replicate that change") and works
> something like Bittorrent on the back end in that files are sent in chunks
> but it offers functionality that Bittorrent does not, such as encrypted
> transfer, WoT authentication, etc.
>
>
> Interesting.  So I'm guessing your problem is to (compactly) represent the
> state of the shared space.
>
> Do you plan on having index servers, or are you aiming for a fully
> distributed solution?  And, if distributed, do you want each node to
> maintain its own state picture of the shared space, or were you thinking
> that nodes could just snoop admin broadcasts looking for mention of data
> they don't currently have?  [Your question about how to pair / collapse
> messages suggests you might be considering a snoopy solution.]
>
> Asking because keeping a state picture has scalability issues, a snoopy
> solution has complexity issues, and (depending on latency) both have issues
> with performing unnecessary work.  In any event, I have some suggestions.
>
>
>
> Snoopy is the more interesting case.  You start with a queue of file
> operations to be done as gleaned from the admin messages - mkdir, rmdir,
> fetch a file, delete a file, etc. - in whatever order the messages were
> received.
>
> Separately, you maintain a (hash table) mapping from pathnames to a list
> of queue nodes that operate on that object.  The map should use weak
> references so that nodes can safely be removed from the queue and discarded
> without also needing to update the map.  If queue processing gets to some
> operation first, any map reference to it will dissolve (be replaced by
> #f).
>
> When a message is received, you lookup the pathname in the map, and if a
> complementary operation is found in the queue, you remove and discard it.
> [You can also remove references in the map or just let them dissolve
> depending on your handling.]   Then simply discard the message.
>
> Otherwise you queue whatever operation the message indicates and add a
> reference to the queue node under the object's pathname in the map.
>
> Extra complexity comes in having to notice that map entries (pathnames)
> have no operations left in the queue.  Weak references don't just disappear
> - they are changed to #f when the referenced object is no longer reachable
> - however AFAICT there is no hashtable variant that permits weak reference
> values, so you have to use weak-boxes and those continue to exist even if
> the objects they reference are gone.   Useless map entries will need to be
> identified and removed somehow.
>
>
> Modeling the filesystem can be done rather simply with a trie in which
> folders are represented by mutable hash tables and files by structures.
> You can combine this with the operation queue above, but in this case
> lookups can be done in the trie and queue references kept in the trie
> nodes.  And the trie provides a snapshot of the current state which may be
> useful for other purposes.
>
>
> The trick in either case is processing latency: you don't want to wait too
> long, but if you really want to avoid unnecessary work you need to delay
> performing file operations long enough that complementary messages are
> likely to be received.
>
>
>
> What if messages are lost permanently, e.g., due to hardware crash?
>>
>
>> What it you receive a create but a corresponding delete or update is
>> lost - then your information / picture of the file system state is wrong.
>>
>> What if you receive a file delete without a corresponding create? In the
>> absence of other information, can you even assume there *was* a create?
>> If these messages are sent in response to user actions, can they ever be
>> sent mistakenly?
>>
>>
> The ultimate answer to these questions is "If things get out of sync in a
> way that the system cannot resolve, it will be flagged for a human to
> resolve." There are things we do that mitigate them -- for example, a
> write-ahead log for messages received from peers -- but we acknowledge that
> we cannot resolve 100% of situations automatically.  Neither can any other
> file replication service.  (Dropbox, Box.com, etc)
>
> Also relevantly, differences are reconciled across multiple peers.  If
> there's 5 peers in your replication set and the other 4 agree that there
> should be a file at path P but you don't have one then it's safe to assume
> that you missed a File-Create message.  And yes, that comes with issues of
> its own (Q: What if it was deleted on your machine and none of the others
> got your File-Delete because you crashed before sending it? A: Worst case,
> the file gets recreated and the user deletes it again.  Also, move files to
> a Trash folder in response to a File-Delete, don't actually delete them for
> a certain period of time) but again we fall back to human resolution.
>
>>
>>
> Are you are considering eavesdropping on multicast?  That may work fine on
> a LAN ... but for wide area the variability in UDP complicates maintaining
> a consistent global state picture.  For a real replication system I think
> you really will want a reliable, causal delivery mechanism.  And if large
> scalability is an issue you will want an adaptive topology that limits the
> number of connections.
>
>
> YMMV.  Hope this sparks a good idea.
> George
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/CAE8gKocamy%3DdV7X6EBPP0F68OmP16X4UQFZnoOU3-HkRGs-s%3Dw%40mail.gmail.com.

Re: [racket-users] Best data structure for ordered data set with insertion and reordering?

Reply via email to