Hello, Some kind of refcount assertion failure we have been having is "refcount detected use-after-free!" from the call to refcounts_ref inside _ports_bucket_class_iterate, called for instance by the periodic sync. I'm thinking: what about this scenario:
- thread A calls diskfs_nput() - that calls refcounts_deref_weak(), which gets both hard and weak counts to 0 - thread B calls diskfs_sync_everything, which calls ports_bucket_iterate, which calls _ports_bucket_class_iterate - _ports_bucket_class_iterate iterates over nodes, and calls refcounts_ref for the node deref-ed above. It then traps on the assertion. So the situation is that diskfs_nput is releasing a node while sync_everything is trying to sync it. Just using refcounts_unsafe_ref is most probably not a solution since the node is on its way out, the function called by the iterator will most probably get it all wrong on a node being destroyed. Another way would be as attached: just skip nodes which didn't have any reference. I however don't know libports enough to say whether that's correct behavior for libports or not: AIUI callers of refcounts_deref* are supposed to check for 0,0 and are then responsible for destroying the node? Samuel
diff --git a/libports/bucket-iterate.c b/libports/bucket-iterate.c index b021b99..76dc3f7 100644 --- a/libports/bucket-iterate.c +++ b/libports/bucket-iterate.c @@ -58,7 +58,14 @@ _ports_bucket_class_iterate (struct hurd_ihash *ht, if (class == 0 || pi->class == class) { - refcounts_ref (&pi->refcounts, NULL); + struct references result; + refcounts_unsafe_ref (&pi->refcounts, &result); + if (result.hard == 1 && result.weak == 0) + { + /* This one is on its way out, skip it. */ + refcounts_deref (&pi->refcounts, NULL); + continue; + } p[n] = pi; n++; }