On Wed, Oct 31, 2012 at 10:41 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: > Il 30/10/2012 20:13, Stefan Hajnoczi ha scritto: >> On Fri, Oct 26, 2012 at 04:05:49PM +0200, Paolo Bonzini wrote: >>> +static void event_notifier_ready(EventNotifier *notifier) >>> +{ >>> + ThreadPoolElement *elem, *next; >>> + >>> + event_notifier_test_and_clear(notifier); >>> +restart: >>> + QLIST_FOREACH_SAFE(elem, &head, all, next) { >>> + if (elem->state != THREAD_CANCELED && elem->state != THREAD_DONE) { >>> + continue; >>> + } >>> + if (elem->state == THREAD_DONE) { >>> + trace_thread_pool_complete(elem, elem->common.opaque, >>> elem->ret); >>> + } >>> + if (elem->state == THREAD_DONE && elem->common.cb) { >>> + QLIST_REMOVE(elem, all); >>> + elem->common.cb(elem->common.opaque, elem->ret); >> >> This function didn't take the lock. First it accessed elem->state and >> how it reads elem->ret. We need to take the lock to ensure both >> elem->state and elem->ret have been set - otherwise we could read >> elem->ret before the return value was stored. > > Right. posix-aio-compat didn't need this because it only had ret. > Just as important: the locking policy was not documented at all. > > I'm applying some changes. Logically (and for ease of review) they are > four patches on top of this one, but they'll be squashed in the next > submission. (Hmm, the fourth should be separate). > >>> +typedef struct ThreadPoolCo { >>> + Coroutine *co; >>> + int ret; >>> +} ThreadPoolCo; >>> + >>> +static void thread_pool_co_cb(void *opaque, int ret) >>> +{ >>> + ThreadPoolCo *co = opaque; >>> + >>> + co->ret = ret; >>> + qemu_coroutine_enter(co->co, NULL); >>> +} >>> + >>> +int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg) >>> +{ >>> + ThreadPoolCo tpc = { .co = qemu_coroutine_self(), .ret = -EINPROGRESS >>> }; >>> + assert(qemu_in_coroutine()); >>> + thread_pool_submit_aio(func, arg, thread_pool_co_cb, &tpc); >>> + qemu_coroutine_yield(); >>> + return tpc.ret; >> >> It's important to understand that the submit_aio, yield, return ret >> pattern works because we assume this function was called as part of the >> main loop. >> >> If thread_pool_submit_co() was called outside the event loop and global >> mutex, then there is a race between the submit_aio and yield steps where >> thread_pool_co_cb() is called before this coroutine yields! > > Even before that, thread_pool_submit_aio would race on the > non-thread-safe qemu_aio_get. Also, head is protected by the BQL. > > event_notifier_ready needs to run under the BQL too, because it > accesses head and also calls qemu_aio_release. > > qemu_aio_get and qemu_aio_release should be moved to AioContext, so > that they can use the (upcoming) AioContext lock instead of the BQL. > The thread pool needs to be per-AioContext instead of using globals, > too. However, this can be done later.
I just sent a patch to move from a manual freelist to g_slice_alloc(). This does away with manual pooling and is thread-safe. It's possible that a lock + freelist is more efficient, but I think g_slice_alloc() is a good start. Your patches look good, looking forward to the next spin. Stefan