> This is why I was insisting on passing *memory* through IPC. It's not at all clear that makes any kind of sense, unless you mean something I haven't imagined. Can you be specific about exactly what the interface (say, a well-commented MiG .defs fragment) you have in mind would look like?
If it's an RPC that passes out of line memory, that (IIRC) always has virtual-copy semantics, never page-sharing semantics. So it would be fundamentally the wrong model for matching up with other futex calls (from the same task or others) to synchronize on a shared int, which is what the futex semantics is all about. What I always anticipated for a Machish futex interface was vm_futex_* calls, which is to say, technically RPCs to the task port (which need not be the task port of the caller), passing an address as an integer literal just as calls like vm_write do (and each compare&exchange value as a datum, i.e. an integer literal, just as vm_write takes a datum of byte-array type, with semantics unchanged by whether that's inline or out of line memory). The task port and address serve as a proxy by which the kernel finds the memory object and offset, and the actual synchronization semantics are about that offset in that memory object and the contents of the word at that location. (Like all such calls, they would likely be optimized especially for the case of calls to task-self and probably even to the extent of having a bespoke syscall for the most-optimized case, as with vm_allocate. But that's later optimization.) Given the specified usage patterns for the futex operations, it might be reasonable enough to implement those semantics solely by translating to a physical page, including blocking to fault one in, and then associating the wait queues with offsets into the physical page rather than the memory object abstraction. (Both a waiter and a waker will have just faulted in the page before making the futex call anyway.) But note that the semantics require that if a waiter was blocked when the virtual page got paged out, then when you page it back in inside vm_futex_wake, that old waiter must get woken. I don't know the kernel's VM internals much at all, but I suspect that all tasks mapping a shared page do not get eagerly updated when the memory object page is paged in to service a page fault in some other task, but rather service minor faults on demand (i.e. later) to rediscover the new assocation between the virtual page and the new physical page incidentally brought in by someone else's page fault a little earlier. Since you need to track waiters at the memory object level while their page is nonresident anyway, it probably makes sense just to hang the {offset => wait queue} table off the memory object and always use that. At least, that seems like the approach for the first version that ensures correctness in all the corners of the semantics. It can get fancier as needed in later optimizations. When it comes to optimizing it, a fairly deep understanding of the Linux futex implementation (which I don't have off hand, though I have read it in the past) is probably instructive. Thanks, Roland