kezhuw commented on code in PR #1997:
URL: https://github.com/apache/zookeeper/pull/1997#discussion_r1223904959
##########
zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java:
##########
@@ -478,6 +482,7 @@ public void createNode(final String path, byte[] data,
List<ACL> acl, long ephem
parent.stat.setCversion(parentCVersion);
parent.stat.setPzxid(zxid);
}
+ Long acls = aclCache.convertAcls(acl);
Review Comment:
Given this patch, how bug#1 could happen ?
> We would create ACL 6 with a refcount of 1 at step 2 and leave it
unchanged at step 3 because ACL 6 already exists. Deleting /a1 at step 4 would
decrement the refcount and consequently garbage-collect the ACL entry, leaving
/a2 pointing to a non-existent ACL entry.
step#3 creates a new node with existing ACL, so that ACL will get ref count
incremented. step#4 will not delete ACL 6.
##########
zookeeper-server/src/main/java/org/apache/zookeeper/server/DataTree.java:
##########
@@ -446,21 +446,25 @@ public void createNode(final String path, byte[] data,
List<ACL> acl, long ephem
throw new NoNodeException();
}
synchronized (parent) {
- // Add the ACL to ACL cache first, to avoid the ACL not being
- // created race condition during fuzzy snapshot sync.
- //
- // This is the simplest fix, which may add ACL reference count
- // again if it's already counted in the ACL map of fuzzy
- // snapshot, which might also happen for deleteNode txn, but
- // at least it won't cause the ACL not exist issue.
- //
- // Later we can audit and delete all non-referenced ACLs from
- // ACL map when loading the snapshot/txns from disk, like what
- // we did for the global sessions.
- Long acls = aclCache.convertAcls(acl);
-
Set<String> children = parent.getChildren();
if (children.contains(childName)) {
+ DataNode child = nodes.get(path);
+ if (child == null) {
+ LOG.error("Parent claims child exists but it does not: " +
path);
+ } else {
+ synchronized (child) {
+ // Due to fuzzy snapshot, it's possible that a node
+ // exists but its ACL is missing in the cache. This is
+ // because they serialized ACL cache before node is
+ // created but serialized nodes after creation. In this
+ // case, we'll replay node creation transaction. We
+ // should add it to cache and reset the ACL if the
+ // previous longval does not exist.
+ if (!aclCache.isLongCached(child.acl)) {
Review Comment:
Is this the cause to bug#2 in this patch ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]