I think that I've run into the known issue of dtrace/cyclic deadlock. Just would like to run my understanding and ideas by you.
The problem is that the cyclic_fire() callback is executed in the interrupt filter context (and thus with interrupts disabled) and it tries to obtain a spin mutex lock in the cyclic code. At the same time other CPU may execute a thread that holds that spin mutex and uses smp_rendezvous_cpus() to perform a synchronous function invocation on the first CPU. So, CPU #1 can not make forward progress because it is spinning on the spin-lock and CPU #2 can not make forward progress because it can not interrupt CPU #1. I think that the problem was introduced during the porting of the code. On (Open)Solaris there are no spin-locks in this code, all data structures are per-CPU and data coherency is ensured by (1) accessing the data only from the CPU to which it belongs; and (2) using some modern-day spl*() equivalent[?] to block interrupts. I think that this is quite similar to what we do for per-CPU caches in UMA and so the same approach should work here. That is, as in (Open)Solaris, the data should be accessed only from the owning CPU and spinlock_enter()/spinlock_exit() should be used to prevent races between non-interrupt code and nested interrupt code. What do you think? Thanks! -- Andriy Gapon _______________________________________________ [email protected] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[email protected]"

