Instead of task switching deep inside some callback, could you run the
libuv event processing as a separate task, via `uv_run(uv_default_loop,
UV_RUN_ONCE)`, then task switch from outside the library? That would let
libuv finish updating all of its internal structures and remove itself from
the stack before anything else happens.


On Sun, Apr 22, 2018 at 8:28 AM Ben Noordhuis <[email protected]> wrote:

> On Fri, Apr 20, 2018 at 7:51 PM, Jason Madden
> <[email protected]> wrote:
> > I apologize up front for the length of this. There's a tl;dr at the end.
> >
> >> On Apr 20, 2018, at 04:07, Ben Noordhuis <[email protected]> wrote:
> >>
> >> On Thu, Apr 19, 2018 at 4:08 PM, Jason Madden
> >> <[email protected]> wrote:
> >>>
> >>>
> >>> After some debugging it became clear what was going on. The gory
> >>> details are in the bug report [2], but in summary this is an
> >>> interaction between the facts that the QUEUE that is used to iterate
> >>> through the check watchers[3] is stack allocated...
> >
> >>>  Combining switching
> >>> greenlets with stopping watchers thus can result in the QUEUE getting
> >>> restored to an invalid state (infinite loop in this case).
> >>>
> >>> We solved this problem by heap allocating the QUEUE[5] on each entry to
> >>> uv__run_check so that it no longer got saved and restored by greenlet
> >>> switches. This obviously adds some overhead, though I didn't try to
> >>> quantify it.
> >>>
> >>> Potentially a better fix would be to have the QUEUE as part of the
> >>> loop structure; I didn't do that here to keep the patch as small as
> >>> possible (and I didn't know if that would hurt the chances of it
> >>> getting upstreamed due to ABI concerns). Or maybe there are other
> >>> approaches that are better? Feedback is welcome!
> >>
> >> I suppose we could change that in libuv v2 (not v1 because of ABI)
> >
> > Understood.
> >
> > Since we can't change the loop structure, and assuming malloc/free are
> "too expensive"[1], what came to mind, something I've implemented in the
> past for a similar case, was a static freelist of objects (QUEUE in this
> case). It could either be thread local or with appropriate intrinsics be
> implemented atomically. Objects would have to be checked out and returned
> using an API, much like malloc/free. I know that's not as easy as stack
> allocation from a correctness standpoint (but see below, that may be
> necessary for avoiding regressions). It looks like the intrinsics may
> already be there in atomic-ops.h (but I'll be honest, it's been years since
> I implemented anything like that).
> >
> >> I suspect it would break again quickly because there is no good way to
> >> regression-test it.
> >
> > Agreed. I was giving it a little bit of thought, and (aside from simply
> having the style guide say "don't do that",) the only way I had come up
> with to really avoid *needing to regression test it* was to simply make it
> impossible: Make the queue type private and wrap an API around getting and
> returning a pointer to one (again, just like malloc/free, except the API
> could be macros). I'm sure nobody would be a big fan of that, but it would
> have the advantage of being flexible (the loop could keep a QUEUE of
> QUEUEs, popping and pushing as needed, growing a new entry should one be
> needed recursively). That flexibility may not be necessary if there are
> only a few defined places that are "reentrant" like in the original bug, in
> which case still hiding the type and a few defined macros could be a decent
> approach.
> >
> >> For instance, there's currently a pull request
> >> under review that introduces a stack-allocated QUEUE in
> >> uv_write()-related code.
> >
> > If you are referring to https://github.com/libuv/libuv/pull/1787, I
> don't think that would cause any problems. The new stack allocated variable
> is not reachable outside that scope as far as I can see...but that does
> highlight the point that it's either "never ever have a stack allocated
> queue" or "carefully eyeball each instance, expect to eventually
> misidentify a case and just wait for a downstream to report the bug".
> >
> > Regarding v1 vs v2 and timelines, here's the way thing look from
> gevent's perspective:
> >
> > As a gevent maintainer, I don't really mind too bad carrying around a
> small patch to libuv (we did that for quite awhile with libev). However, it
> limits the audience for gevent+libuv. When I'm carrying around a patch for
> libuv, I have to have a custom, embedded, build for it, I can't use the
> system provided library[2]. As a gevent maintainer, I also don't hate that
> because I know what we're getting when I get bug reports, and I also know
> that I can set flags to get link-time-optimizations and the best
> performance out of the Python bindings. However, there are lots of people
> who install the packaged gevent provided by their (Linux) distribution, and
> there are lots of distributions that refuse to allow embedded libraries,
> for perfectly valid reasons, so if gevent can't use the system libuv, it
> can't use libuv on that distro at all. If gevent has significant users that
> can't use libuv, I can't ultimately deprecate and drop libev without
> leaving those users behind, should that moment ever get here.
> >
> > That's the same reason I can't use libuv v2 until it's released and
> starts getting picked up by distros.
> >
> > TL;DR:
> >
> > This is all a bit hypothetical right now. gevent 1.3 is only in beta and
> hasn't been packaged by distros yet AFAIK. Being able to drop my patches
> for gevent 1.3/libuv 1.21 would be cool, though given the complexity of an
> API- and CAS-based approach, that seems unlikely. But if I have to wait for
> next year and gevent 1.4/libuv v2 to build against a system libuv and drop
> my patches, I'm cool with that too.
> >
> > I'm happy to prepare a simple (ABI-breaking) PR against master/v2 if
> that sounds attractive (although I see there are already outstanding
> QUEUE-changing PRs for master). In light of the measured expense of the
> malloc/free calls, I've already adapted my own patch to do that.
> >
> > If it doesn't sound attractive right now because of testing and
> maintenance concerns, I understand that too. Like I said, I don't really
> mind carrying around a small patch to libuv, especially not in the early
> days of gevent+libuv. I can circle back sometime in the future if it seems
> warranted either because the patch complexity grows or because of requests
> from distros/users.
> >
> >
> > Thanks again for your thoughts on this,
> > Jason
> >
> > [1]
> >
> > I decided to quantify the effect of the malloc call on some systems I
> had laying around. I took the uv__run_ code, set up a bunch of handles with
> a no-op callback, compiled with all optimizations enabled, and ran timings
> for various numbers of active handles. Approximate costs of the malloc/free
> of the QUEUE on various systems:
> >
> > Linux x86_64:              30ns
> > FreeBSD 10 x86_64:         35ns
> > Solaris 11 x86_64:         60ns
> > macOS 10.13.4 w/jemalloc:  60ns
> > macOS 10.13.4:             80ns
> > Windows 10 x86_64:         80ns
> > Linux arm Rasp Pi 2:      500ns
> >
> > The approximate number of active handles needed to amortize the cost
> away (due to the extra function callbacks) ranged from 10 on the low end to
> 100 in the middle to 1000+ on the high end.
> >
> > [2]
> >
> > Actually, right now I can't build with a system libuv either. I was
> never able to get the Python extension to link correctl on Windows, so I'm
> doing what https://github.com/saghul/pyuv does and using setuptools to
> build libuv on all platforms. The plan is to circle back to that when
> distros ask.
>
> tl;dr appreciated. :-)
>
> I can understand not wanting to carry patches on top of libuv.  We
> could add a per-event loop flag that instructs libuv to heap-allocate
> queues; that could go into libuv v1.
>
> Usage sketch:
>
>   uv_loop_t* loop = /* ... */;
>   uv_loop_configure(loop, UV_LOOP_MALLOC_QUEUES, 1);
>
> And in libuv itself in places where it's needed:
>
>   QUEUE* q;
>   QUEUE q_storage;
>
>   q = &q_storage;
>   if (loop->flags & UV_LOOP_MALLOC_QUEUES)
>     if (NULL == (q = uv__malloc(sizeof(*q))))
>       return UV_ENOMEM;
>
>   /* ... */
>
>   if (q != &q_storage)
>     uv__free(q);
>   q = NULL;
>
> It doesn't really address the testing issue and there may be places
> where malloc failures cannot be propagated upwards, but it's a start.
>
> --
> You received this message because you are subscribed to the Google Groups
> "libuv" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/libuv.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"libuv" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/libuv.
For more options, visit https://groups.google.com/d/optout.

Reply via email to