On Thu, Apr 19, 2018 at 4:08 PM, Jason Madden <[email protected]> wrote: > Hi all, > > With the support for fork merged and released last year, I've been > able to start releasing gevent (Python coroutine [greenlet] networking > library) with libuv support. Thanks! > > Earlier this year we discovered an issue [1] wherin a stopped check > watcher kept getting called over and over in an infinite loop. This > happened if there was more than one started check watcher when we > entered uv__run_check, we stopped a check watcher in a check watcher > callback, and a check watcher callback also incurred a greenlet > switch. > > After some debugging it became clear what was going on. The gory > details are in the bug report [2], but in summary this is an > interaction between the facts that the QUEUE that is used to iterate > through the check watchers[3] is stack allocated (local variables in > the function uv__run_check function), and that switching greenlets > saves and restores (portions of) the C stack[4] and that iterating > through the queue temporarily stores the address of this > stack-allocated queue into heap-allocated objects where they can be > manipulated (e.g., by stopping a watcher). Combining switching > greenlets with stopping watchers thus can result in the QUEUE getting > restored to an invalid state (infinite loop in this case). > > Windows doesn't have this problem because there loop-watcher.c doesn't > use a stack allocated QUEUE. > > We solved this problem by heap allocating the QUEUE[5] on each entry to > uv__run_check so that it no longer got saved and restored by greenlet > switches. This obviously adds some overhead, though I didn't try to > quantify it. > > Potentially a better fix would be to have the QUEUE as part of the > loop structure; I didn't do that here to keep the patch as small as > possible (and I didn't know if that would hurt the chances of it > getting upstreamed due to ABI concerns). Or maybe there are other > approaches that are better? Feedback is welcome! > > Is a fix for this something that upstream would consider? I've seen > other references to coroutine users and I don't know if they've seen > anything like this, or maybe their coroutine libraries don't involve > stack swapping. A potential negative to an upstream fix is that this > is hard to test in pure C code. > > Thanks, > Jason > > Footnotes: > [1] https://github.com/gevent/gevent/issues/1126 > > [2] https://github.com/gevent/gevent/issues/1126#issuecomment-369740928 > > [3] As well as prepare and idle watchers. > > [4] https://github.com/python-greenlet/greenlet/blob/master/greenlet.c#L11 > > [5] > https://github.com/gevent/gevent/commit/1ef16f38491000f4fdafdc01dc809d30fc785867
I suppose we could change that in libuv v2 (not v1 because of ABI) but I suspect it would break again quickly because there is no good way to regression-test it. For instance, there's currently a pull request under review that introduces a stack-allocated QUEUE in uv_write()-related code. -- You received this message because you are subscribed to the Google Groups "libuv" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/libuv. For more options, visit https://groups.google.com/d/optout.
