Hello,

Some kind of refcount assertion failure we have been having is "refcount
detected use-after-free!" from the call to refcounts_ref inside
_ports_bucket_class_iterate, called for instance by the periodic sync.
I'm thinking: what about this scenario:

- thread A calls diskfs_nput()
- that calls refcounts_deref_weak(), which gets both hard and weak
  counts to 0
- thread B calls diskfs_sync_everything, which calls
  ports_bucket_iterate, which calls _ports_bucket_class_iterate
- _ports_bucket_class_iterate iterates over nodes, and calls
  refcounts_ref for the node deref-ed above. It then traps on the
  assertion.

So the situation is that diskfs_nput is releasing a node while
sync_everything is trying to sync it.

Just using refcounts_unsafe_ref is most probably not a solution since
the node is on its way out, the function called by the iterator will
most probably get it all wrong on a node being destroyed.

Another way would be as attached: just skip nodes which didn't have any
reference.  I however don't know libports enough to say whether that's
correct behavior for libports or not: AIUI callers of refcounts_deref*
are supposed to check for 0,0 and are then responsible for destroying
the node?

Samuel
diff --git a/libports/bucket-iterate.c b/libports/bucket-iterate.c
index b021b99..76dc3f7 100644
--- a/libports/bucket-iterate.c
+++ b/libports/bucket-iterate.c
@@ -58,7 +58,14 @@ _ports_bucket_class_iterate (struct hurd_ihash *ht,
 
       if (class == 0 || pi->class == class)
        {
-         refcounts_ref (&pi->refcounts, NULL);
+         struct references result;
+         refcounts_unsafe_ref (&pi->refcounts, &result);
+         if (result.hard == 1 && result.weak == 0)
+         {
+           /* This one is on its way out, skip it.  */
+           refcounts_deref (&pi->refcounts, NULL);
+           continue;
+         }
          p[n] = pi;
          n++;
        }

Reply via email to