On Tue, 9 Sep 2014 13:10:48 +0200 Marek Chalupa <[email protected]> wrote:
> On 4 September 2014 15:17, Pekka Paalanen <[email protected]> wrote: > > > On Fri, 29 Aug 2014 11:21:30 +0200 > > Marek Chalupa <[email protected]> wrote: > > > > > When wl_display_read_events() returns with errno == EAGAIN, we > > > naturally try to call it again. But this next call results in > > > deadlock. > > > > > > Signed-off-by: Marek Chalupa <[email protected]> > > > --- > > > tests/display-test.c | 88 > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 88 insertions(+) > > > > > > diff --git a/tests/display-test.c b/tests/display-test.c > > > index 1289866..e1cf830 100644 > > > --- a/tests/display-test.c > > > +++ b/tests/display-test.c > > > @@ -32,6 +32,7 @@ > > > #include <sys/types.h> > > > #include <sys/stat.h> > > > #include <pthread.h> > > > +#include <poll.h> > > > > > > #include "wayland-private.h" > > > #include "wayland-server.h" > > > @@ -480,6 +481,93 @@ TEST(threading_cancel_read_tst) > > > display_destroy(d); > > > } > > > > > > +/* set this to 1 when it is right that > > > + * the thread is woken up */ > > > +static int threads_can_be_woken_up = 0; > > > + > > > +static void * > > > +thread_eagain_func(void *data) { > > > + struct client *c = data; > > > + > > > + register_reading(c->wl_display); > > > + > > > + c->display_stopped = 1; > > > + assert(wl_display_read_events(c->wl_display) == 0 > > > + && errno != EAGAIN); > > > + /* if thread was sleeping in read_events, this assert > > > + * must pass now */ > > > + assert(threads_can_be_woken_up && "Thread woke up too > > > early"); + > > > + pthread_exit(NULL); > > > +} > > > + > > > +static void > > > +threading_read_eagain(void) > > > +{ > > > + struct client *c = client_connect(); > > > + pthread_t th1, th2, th3; > > > + struct pollfd fds; > > > + int i; > > > + > > > + register_reading(c->wl_display); > > > + > > > + th1 = create_thread(c, thread_eagain_func); > > > + th2 = create_thread(c, thread_eagain_func); > > > + th3 = create_thread(c, thread_eagain_func); > > > + > > > + /* All the threads are sleeping, waiting until read or > > > cancel > > > + * is called. Since we have no data on socket waiting, > > > + * the wl_connection_read should end up with error and set > > > errno > > > + * to EAGAIN => wl_display_read_events() will return 0 and > > > set > > errno > > > + * to EAGAIN. It should not wake up the sleeping threads, > > > so that > > > + * this thread can try to read again */ > > > + alarm(3); > > > + assert(wl_display_read_events(c->wl_display) == 0); > > > + assert(errno == EAGAIN); > > > > I think there is a problem here. wl_display_read_events() returning > > 0 means success, and in that case errno may not be set if it was a > > real success. There is no EAGAIN to be handled, it's already done > > inside read_events(). > > > > In this test it won't be a real success, but you cannot code > > clients to check for errno after getting 0 from > > wl_display_read_events(). > > > > > > + > > > + /* the same should happen next times - but there was a bug > > > when > > > + * the main thread got blocked! */ > > > + for (i = 0; i < 5; ++i) { > > > + assert(wl_display_read_events(c->wl_display) == 0); > > > > Therefore you are here calling wl_display_read_events() again after > > it succeeded! And that IMO is a developer error. > > > > yes > > > > Besides, I cannot see anywhere where it says that it is ok to call > > wl_display_read_events() again even if it failed, without calling > > wl_display_prepare_read() first. > > > > > Since original piece of code returned 0 on EAGAIN, then I suppose the > author considered this as a success, so yes, wl_display_read_events() > is not probably meant > to be called again. But consider this: > if the thread that got EAGAIN (the only thread that is not sleeping) > calls wl_display_prepare_read() > + wl_display_read_events() again, it works. Problem is that it does > not have to call it > right after the failed wl_display_read_events and the other threads > are still sleeping. EAGAIN is not a failure here. It is only a spurious wakeup, and all threads should go back sleeping on whatever they need to sleep on, usually poll(). > Moreover, if there are not events on the socket for some time, this > one thread is looping in cycle > while the other threads are sleeping. It's not a busy-loop, because the thread would be sleeping in poll() or equivalent. But yes, I see that the other threads sleeping inside their read_events() are not woken up, so they do not do a cycle. I'm not sure if that is a problem or not. I suppose waking everyone up would not hurt, but not waking up would risk deadlocking if the thread with EAGAIN never read again. > What I meant to do was to give the programmer the choice whether to > loop or let > all threads try it again (cancel read). But as you said before, 0 is > wrong return value for that. > See end of the e-mail for conclusions. Yeah, it seems to be wrong to require cancel_read() after calling read_events(), no matter whether it succeeded or not. > Summing it up, in my point of view we have these options: > > * Let it as it is: > this can lead to starving of other threads for some time. > Moreover, if the thread that > got EAGAIN decides to exit (or simply not to read again) instead > of another trial to read, the other threads will block. Yup, I agree with the analysis here. > * return -1 and set EAGAIN > programmer can either try to read again, or cancel read and let > all threads do another cycle. Good thing is > that programmer knows that there was an error and can cancel read > in the case that the thread is exiting > instead of another cycle. EAGAIN is not an error, which would make this solution an ABI break. > * wake up threads on EAGAIN > this solution will force all threads to do another cycle (or > whatever they want) and should not have any side effects. It can lead > to big overhead if there are no events on the socket for some time > (waking up threads again and again in cycle) I think we should do this. I don't really see the overhead at all, since every thread should have poll() in its loop, so they will not be busy-looping. After all, the sleep inside read_events() is only for synchronizing reading threads, not for waiting new data to appear. EAGAIN in read_events() is a success, even though there was nothing received. On success (and on failure!) all threads sleeping inside read_events() should be woken up. IOW, whatever thread hits the reader_count==0 case in read_events() should *always* end up calling display_wakeup_threads() before returning. Does this make sense? In fact, I think wl_display_read_event() failing on last_error should be equivalent to calling wl_display_cancel_read(). It would be the same if the app manually first checked for error and then canceled. Thanks, pq _______________________________________________ wayland-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/wayland-devel
