On Fri, Oct 26, 2012 at 04:05:49PM +0200, Paolo Bonzini wrote: > +static void event_notifier_ready(EventNotifier *notifier) > +{ > + ThreadPoolElement *elem, *next; > + > + event_notifier_test_and_clear(notifier); > +restart: > + QLIST_FOREACH_SAFE(elem, &head, all, next) { > + if (elem->state != THREAD_CANCELED && elem->state != THREAD_DONE) { > + continue; > + } > + if (elem->state == THREAD_DONE) { > + trace_thread_pool_complete(elem, elem->common.opaque, elem->ret); > + } > + if (elem->state == THREAD_DONE && elem->common.cb) { > + QLIST_REMOVE(elem, all); > + elem->common.cb(elem->common.opaque, elem->ret);
This function didn't take the lock. First it accessed elem->state and how it reads elem->ret. We need to take the lock to ensure both elem->state and elem->ret have been set - otherwise we could read elem->ret before the return value was stored. > +typedef struct ThreadPoolCo { > + Coroutine *co; > + int ret; > +} ThreadPoolCo; > + > +static void thread_pool_co_cb(void *opaque, int ret) > +{ > + ThreadPoolCo *co = opaque; > + > + co->ret = ret; > + qemu_coroutine_enter(co->co, NULL); > +} > + > +int coroutine_fn thread_pool_submit_co(ThreadPoolFunc *func, void *arg) > +{ > + ThreadPoolCo tpc = { .co = qemu_coroutine_self(), .ret = -EINPROGRESS }; > + assert(qemu_in_coroutine()); > + thread_pool_submit_aio(func, arg, thread_pool_co_cb, &tpc); > + qemu_coroutine_yield(); > + return tpc.ret; It's important to understand that the submit_aio, yield, return ret pattern works because we assume this function was called as part of the main loop. If thread_pool_submit_co() was called outside the event loop and global mutex, then there is a race between the submit_aio and yield steps where thread_pool_co_cb() is called before this coroutine yields! I see no reason why this is a problem today but I had to think through this case when reading the code. Stefan