> PS
>
> Early results show that:
>
> new proc and new libports.so.0.2 fails
>
> Other permutations work
Thanks for doing that testing. This is the result I would have predicted
based on the failure modes we are seeing. I'd forgotten to mention the
details of what little investigation I have done, so here they are:
The failure mode is a cascade of recursive lossage until stack overflow.
The lossage going on appears to be that somewhere between the start of main
and the first call to allocate_proc (proc/main.c line 104), proc manages to
deallocate its only send right to its own task port. That is, the
equivalent of mach_port_deallocate(mach_task_self(),mach_task_self()) gets
called (there are other system calls that can have the same effect too).
Here is some important background to understand how this can be so
catastrophic. The task control port is used to do all kernel operations,
including the mach_port_* operations, which control the port name space
itself; the task port is just a send right in that name space like any
other send right (the corresponding receive right is held internally by the
kernel). Fundamentally, a task gets its task control port with the special
system call `mach_task_self'; this is a system call (not an RPC) of no
arguments, and it returns a send right for the task control port in the
task's own port name space, adding one user reference to that send right
(after the very first mach_task_self system call, there is one reference;
another system call increments to two references, doing nothing different
than what a mach_port_mod_refs call would do). However, `mach_task_self'
in the C library is not in fact a function that makes the system call of
that name, but a macro that simply returns the value cached in the global
variable `__mach_task_self_' at startup time. The mach_task_self system
call is as cheap as system calls get, but that is still too expensive to
want to repeat it often, and a task uses its own task port very often. So,
once at program startup time, the C library's initialization code does a
real mach_task_self system call to set the `__mach_task_self_' global
variable; thereafter, everything uses the macro that just fetches the value
stored there. The upshot of this is that no matter how many times
`mach_task_self ()' appears, there is usually just one user reference for
the task's send right to its own task control port. Thus, one stray
deallocation can get you to zero user references, where you have no send
rights and the cached value the mach_task_self macro is returning is no
longer valid; at that point nothing the task tries will work anything like
normally, so all sorts of confusing failure modes can result.
However, none of this explains how using the new compiler is causing us all
this grief. There is an air of suspiciousness about the use of
mach_task_self() (i.e. that some one user reference) in the allocate_proc
call; but, a. that p_task will never in fact be released, because doing so
is only done after terminating the task (and then we ain't doin' nuthin'),
and b. it hasn't even gotten far enough into allocate_proc to even use that
argument yet when it bombs out. But I may be missing something, and I
haven't looked into this for very long.