Hi -

I've been trying to figure what to do with the gsync code.  It causes
undefined behavior and occasional kernel panics when rpctrace is used on
gsync_wait and gsync_wake calls.  I suspect that this ultimately affects
just about every program on a Hurd system.  Even if a program doesn't call
locking primitives directly, they're used by the C library, right?

On Sun, Oct 30, 2016 at 10:16 PM, Kalle Olavi Niemitalo <k...@iki.fi> wrote:

> "Brent W. Baccala" <cos...@freesoft.org> writes:
>
> > Even if I'm right about the nature of this bug, I don't understand
> gnumach
> > well enough to know how a task should access another task's memory.
>
> vm_copy apparently supports such access; code from there could be
> reused.  But if rpctrace uses gsync_wait on the address space of
> another task and the page has been paged out, then the call could
> end up blocking for the pager, and I don't think you want that.
>

I studied vm_copy and came up with some code, but gsync_wake needs to
actually modify the memory location, so setting up a copy doesn't work.
Plus, vm_copy makes a mapping that is visible to user space and leaves the
user space code responsible for deallocating it.

As for blocking for the pager, I think the code does that already.  As
innocuous as this line of code looks:

     *(unsigned int *)addr = val;

it can trigger a page fault and block the RPC, right?

Here's what I think gsync_wait and gsync_wake need to do if their task
argument isn't the current task:

1. lookup the address in the other task's pagemap, flag the map entry
'shared', and insert a copy of it into the kernel pagemap,

2. access the memory location, triggering a page fault which actually
creates the physical mapping, since Mach doesn't create physical maps until
required, then

3a. either remove the pagemap entry from the kernel map when we're done, or
3b. leave it for future use.

If we choose option (3a), then every wrapped call to gsync_{wait,wake} will
trigger a page fault.  If we choose option (3b), then step 1 needs to be
modified to search the kernel pagemap to see if it already has a copy of
the map entry in question.  I have to think some more to figure how that
mapping ultimately gets removed, and of course there's a memory exhaustion
issue if we add lots of new entries to the kernel map on a 32-bit system.

Another possibility would be to reject the RPC back to user space, and put
the old locking code back into glibc as a fallback.

All this brings us back to...

On Sun, Oct 30, 2016 at 10:16 PM, Kalle Olavi Niemitalo <k...@iki.fi> wrote:

>
> Could gsync_wait be removed from gnumach.defs and replaced with
> only a trap that does not take a task_t parameter and cannot be
> intercepted by rpctrace?
>

I don't think we have anything else quite like gsync - kernel code that
directly accesses user space memory in a different task from the one that
trapped.  Our choices seem to be:

1. add a lot of new complexity to the gsync kernel routines
2. add fallback code to glibc
3. add a new system call as Kalle proposed

Comments?

    agape
    brent

Reply via email to