Long ago I worked in the kernel of a BSD which supported async (disk) I/O.
The internals did use a kernel thread for queueing and reporting.
There was one per process using AIO because it made life simpler:
each page which was mentioned in AIO had to be pinned,
and "blaming" that on the AIO thread was easy.
similarly, there was "someone to talk to" when the I/O completed.
Useful? It made architecting correct replacements for thread-using
catastrophes -much- easier. Implement a work queue in the user program
and -never- block anywhere except in one timer poll() in the middle.
As I remember, it didn't take much code to implement, and
in the architecture as I remember it, kernel threads were cheap.
geoff steckel
very-ex Alliant Computer