On 7/16/2020 4:29 AM, David Storrs wrote:
tl;dr
Can anyone recommend a data structure that is ordered and supports efficient reordering, insertion at arbitrary location, and deletion?

Long form:

I'm working on an operation-log reconciliation problem, where each operation is one of:

  File-Create    P H
  File-Update   P H
  File-Delete    P H
  Folder-Create P
  Folder-Delete P

P = path
H = hash of the file (e.g. md5)

Operation messages travel over the network, meaning that they could arrive out of order.  (They could also be missed entirely, but that's a separate problem that I'm not working on yet.)

Shouldn't be a problem:  "at-least-once" and "at-most-once" semantics both are pretty easy.  It's the "exactly-once" that is the problem.  Hopefully you have no need for that.


Specifically, I want to be able to take a series of messages and collapse them where appropriate.  For example:

  File-Update P H1 -> H2
  File-Create P1 H1
Result after collapse:
  '(File-Create P1 H2)

  File-Create P H1
  File-Delete P H1
Result after collapse:
  '()

  File-Delete P X
  File-Create P X
Result after collapse:
  '()
  File-Delete P1 H1
  File-Create P2 H2
  File-Create P1 H1
Result after collapse:
  '(File-Create P2 H2)

I've been mulling over various ways to handle all of this and digging around to find examples of other people doing it, but I'm wondering if there's a data structure or existing algorithm that will handle it cleanly.  Does anyone know of such a thing?

What if messages are lost permanently, e.g., due to hardware crash?

What it you receive a create but a corresponding delete or update is lost - then your information / picture of the file system state is wrong.

What if you receive a file delete without a corresponding create? In the absence of other information, can you even assume there *was* a create?  If these messages are sent in response to user actions, can they ever be sent mistakenly?

The problem seems under-specified.  Can you say more about the real purpose?


I can think of some ways to keep an index of the file system and track changes made to it ... but at best that could provide a snapshot in time rather than an ongoing audit log.  And it seems like a log would not be terribly useful without all the information.

George

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/e5bfead8-20cc-47b5-3f10-6a568aa04ba9%40comcast.net.

Reply via email to