adamyi commented on PR #1997:
URL: https://github.com/apache/zookeeper/pull/1997#issuecomment-1583944146
Unfortunately, there are some bugs with this patch:
Bug 1
-----
Consider the following sequence of events:
1. Serialize ACL (up to 5)
2. Create /a1 with new ACL 6
3. Create /a2 with ACL 6
4. Delete /a1
Without this feature, we would create ACL 6 with a refcount of 1 at step 2
and
leave it unchanged at step 3 because ACL 6 already exists. Deleting /a1 at
step
4 would decrement the refcount and consequently garbage-collect the ACL
entry,
leaving /a2 pointing to a non-existent ACL entry.
Bug 2
-----
Consider the following sequence of events:
0. ACL is up to 5
1. Create /a0 with ACL 6
2. Remove /a0
3. Serialize ACL (up to 5)
4. Create /a1 with new ACL 7
5. Create /a2 with new ACL 8
6. Create /a3 with new ACL 7
When replaying, we would create ACL 6 at step 4; 7 at step 5. When replaying
step 6, we would treat ACL 7 as already created with the value from step 5.
That
causes /a3 to have the ACL of /a2. This is wrong.
Fixes
---
I'm proposing three approaches to fix these bugs. I don't feel strongly
among these solutions and am happy to implement any of them. Do Zookeeper
reviewers have a preference?
1. Make the ACL cache skip indices for nodes that were deserialized even if
these entries weren't there originally, as if we created and then removed them
(ACL cache is discrete due to GC). We can do this by also bumping aclIndex in
ReferenceCountedACLCache.addUsage. For bug 1 scenario, we'll move /a1 and /a2
both to ACL 7, making it easier to
accurately ref-count them. Deleting /a1 will not affect /a2. For bug 2
scenario, we'll set /a1 and /a3 to point to ACL 9 and /a2 to point to ACL 10.
2. Directly change ACL to -2 during node deserialization if ACL cache does
not exist. When creating node, we overwrite ACL if and only if it's -2. This is
essentially the same as skipping these indices as we're making (ACLs originally
missing) and (recreated ACLs during replay) two disjoint spaces. It might feel
like a more local fix than solution 1 as the code change only involves DataTree
but it's not. It still relies on the fact that -2 is not a valid ACL index in
ReferenceCountedACLCache (ACLCache already uses -1 as special value).
3. An even more radical variation of solution 2 would be to directly delete
the node during node deserialization if ACL cache does not exist. The argument
is that we must re-create it during transaction replay; otherwise the node
would not be accessible due to missing ACL MarshallError anyway.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]