Re: [PATCH v16] kern: simple futex for gnumach

Roland McGrath Fri, 17 Jan 2014 19:46:36 -0800

> This is why I was insisting on passing *memory* through IPC. 

It's not at all clear that makes any kind of sense, unless you mean
something I haven't imagined.  Can you be specific about exactly what the
interface (say, a well-commented MiG .defs fragment) you have in mind would
look like?


If it's an RPC that passes out of line memory, that (IIRC) always has
virtual-copy semantics, never page-sharing semantics.  So it would be
fundamentally the wrong model for matching up with other futex calls (from
the same task or others) to synchronize on a shared int, which is what the
futex semantics is all about.

What I always anticipated for a Machish futex interface was vm_futex_*
calls, which is to say, technically RPCs to the task port (which need not
be the task port of the caller), passing an address as an integer literal
just as calls like vm_write do (and each compare&exchange value as a datum,
i.e. an integer literal, just as vm_write takes a datum of byte-array type,
with semantics unchanged by whether that's inline or out of line memory).
The task port and address serve as a proxy by which the kernel finds the
memory object and offset, and the actual synchronization semantics are
about that offset in that memory object and the contents of the word at
that location.  (Like all such calls, they would likely be optimized
especially for the case of calls to task-self and probably even to the
extent of having a bespoke syscall for the most-optimized case, as with
vm_allocate.  But that's later optimization.)

Given the specified usage patterns for the futex operations, it might be
reasonable enough to implement those semantics solely by translating to a
physical page, including blocking to fault one in, and then associating the
wait queues with offsets into the physical page rather than the memory
object abstraction.  (Both a waiter and a waker will have just faulted in
the page before making the futex call anyway.)  But note that the semantics
require that if a waiter was blocked when the virtual page got paged out,
then when you page it back in inside vm_futex_wake, that old waiter must
get woken.  I don't know the kernel's VM internals much at all, but I
suspect that all tasks mapping a shared page do not get eagerly updated
when the memory object page is paged in to service a page fault in some
other task, but rather service minor faults on demand (i.e. later) to
rediscover the new assocation between the virtual page and the new physical
page incidentally brought in by someone else's page fault a little earlier.
Since you need to track waiters at the memory object level while their page
is nonresident anyway, it probably makes sense just to hang the {offset =>
wait queue} table off the memory object and always use that.  At least,
that seems like the approach for the first version that ensures correctness
in all the corners of the semantics.  It can get fancier as needed in later
optimizations.  When it comes to optimizing it, a fairly deep understanding
of the Linux futex implementation (which I don't have off hand, though I
have read it in the past) is probably instructive.


Thanks,
Roland

Re: [PATCH v16] kern: simple futex for gnumach

Reply via email to