On Wed, 25 Jan 2017 13:29:08 +0200 Pekka Paalanen <[email protected]> wrote:
> On Wed, 25 Jan 2017 11:08:26 +0800 > Jonas Ådahl <[email protected]> wrote: > > > On Mon, Jan 23, 2017 at 03:40:11PM +0200, Pekka Paalanen wrote: > > > On Tue, 20 Dec 2016 18:40:13 +0530 > > > viveka <[email protected]> wrote: > > > > > > > From: "Vivek.A" <[email protected]> > > > > > > > > Without this, client socket file descriptors are being kept open > > > > until the process > > > > hosting the compositor gets terminated / exit. This causes > > > > compositor termination > > > > goes undetected through its fd. This could be reproduced by having > > > > compositor as > > > > one of the threads in a process. > > > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=99142 > > > > > > > > Signed-off-by: Vivek.A <[email protected]> > > > > --- > > > > src/wayland-server.c | 5 +++++ > > > > 1 file changed, 5 insertions(+) > > > > > > > > diff --git a/src/wayland-server.c b/src/wayland-server.c > > > > index 9d7d9c1..8195064 100644 > > > > --- a/src/wayland-server.c > > > > +++ b/src/wayland-server.c > > > > @@ -979,11 +979,16 @@ wl_socket_alloc(void) > > > > WL_EXPORT void > > > > wl_display_destroy(struct wl_display *display) > > > > { > > > > + struct wl_client *c, *cnext; > > > > struct wl_socket *s, *next; > > > > struct wl_global *global, *gnext; > > > > > > > > wl_signal_emit(&display->destroy_signal, display); > > > > > > > > + wl_list_for_each_safe(c, cnext, &display->client_list, link) { > > > > + wl_client_destroy(c); > > > > + } > > > > + > > > > wl_list_for_each_safe(s, next, &display->socket_list, link) { > > > > wl_socket_destroy(s); > > > > } > > > > > > Hi, > > > > > > this needs an update on the documentation to say that existing clients > > > are destroyed. > > > > > > Previously it has been illegal to have existing clients when destroying > > > a wl_display, because destroying a wl_client afterwards would have > > > potentially accessed freed memory: wl_display::client_list list head. > > > However, we failed to document this. > > > > > > As far as I can tell, Weston also got it wrong, and never destroyed the > > > clients before the wl_display. > > > > > > Now starts the self-rant, which could be just TL;DR and conclude as: a > > > good patch, fine by me if you add docs, but Weston is broken anyway. > > > > > > --- > > > > > > (A counter-proposition to this patch could be to just unlink the client > > > list head, and leave the clients to be destroyed by the compositor. I > > > have hard time imagining when that would be useful (Graceful shutdown > > > waiting for clients to exit themselves? This is not a web server.) and > > > it would be awkward to use, too. Also, destroying the wl_display > > > removes the only supported way of polling and servicing the clients. > > > Therefore I reject the counter-proposition.) > > > > > > So I started wondering what happens in Weston, as I should really see > > > the access to freed memory in valgrind, but I don't. Then I realized > > > that nothing frees the remaining clients in Weston on shutdown. Ok, I > > > should see a reduction in leaked memory then. But the results are > > > depressing. > > > > > > Start weston on x11: > > > > > > $ valgrind -v --leak-check=full --show-reachable=yes --track-origins=yes > > > --num-callers=30 weston --use-pixman > > > > > > Then click (twice... why twice...) the icon to start weston_terminal, > > > and then close weston while the terminal window is open. This is needed > > > to have a client remain, because the shell helper clients get torn down > > > anyway. > > > > > > > > > Before patch: > > > > > > ==23476== LEAK SUMMARY: > > > ==23476== definitely lost: 232 bytes in 3 blocks > > > ==23476== indirectly lost: 22,890 bytes in 47 blocks > > > ==23476== possibly lost: 163 bytes in 4 blocks > > > ==23476== still reachable: 72,450 bytes in 181 blocks > > > ==23476== suppressed: 0 bytes in 0 blocks > > > ==23476== > > > ==23476== ERROR SUMMARY: 7 errors from 7 contexts (suppressed: 0 from 0) > > > > > > After patch: > > > > > > ==26025== LEAK SUMMARY: > > > ==26025== definitely lost: 968 bytes in 4 blocks > > > ==26025== indirectly lost: 1,584 bytes in 6 blocks > > > ==26025== possibly lost: 163 bytes in 4 blocks > > > ==26025== still reachable: 72,450 bytes in 181 blocks > > > ==26025== suppressed: 0 bytes in 0 blocks > > > ==26025== > > > ==26025== ERROR SUMMARY: 23 errors from 21 contexts (suppressed: 0 from 0) > > > > > > The indirectly lost amount goes down as expected, but the number of > > > errors grows badly. > > > > > > I suppose the thing is that weston has already pretty much shut down by > > > the time it calls wl_display_destroy(). It's not prepared to resources > > > still existing after shutdown, so when wl_display_destroy() goes > > > destroying the remaining clients, their resources get destroyed, which > > > then accesses freed memory because the tracking infrastructure has > > > already been freed and the resources are full of stale pointers. > > > > > > But that is Weston's problem. > > > > > > One might think to move wl_display_destroy() earlier, but it feels > > > fundamentally wrong, and would likely screw up so many places that > > > assume the display exists. So I won't go there. > > > > > > Weston will need fixing in any case, but the question is how. > > > > > > Do we apply this patch which automatically destroys remaining clients? > > > The alternative is to declare the opposite. > > > > > > If we do, then Weston has to either destroy the clients manually before > > > shutting down or make all the framework proofed against clients getting > > > destroyed after they were shut down(*). > > > > > > If we don't, Weston still leaks, and all compositors should manually > > > destroy all remaining clients. > > > > > > As (*) would be pretty hard and probably not worth the effort, plus > > > kind of counter-intuitive, we need to fix weston to manually destroy > > > remaining clients anyway. And that is going to be more interesting than > > > it sounds, because there are shell helper clients that get > > > automatically respanwed if they disconnect. Maybe. > > > > > > The conclusion: I am in favour of this patch. > > > > > > It is an ABI change though, and while it was crashy to write programs > > > that would break due to this change, the probability of the problem > > > going unnoticed is high. Proof: Weston. > > > > > > Second opinions would be appreciated. > > > > I'm fairly neutral. FWIW, mutter doesn't handle tear down gracefully yet > > anyway, so from that perspective, it doesn't matter. But from the > > valgrind summary, it looks like it *could* introduce new segfaults not > > seen before on non-weston, where it was just a leak before. Which is > > worse, a leak or a segfault on tear-down? That being said, cleaning up > > does make sense from an API perspective. > > Since the alpha release just happened, I'm leaning on postponing this > patch after the final release, to the next development cycle. There is > evidently a considerable risk of regressions in users, IMHO. Sorry for > the delay. Hi, the development cycle has started, and I'm reconsidering this. It seems the fundamental question is how should the wl_display API be used. First, it is indeed ok to destroy wl_clients at almost any time while the compositor is running, because socket events may cause a wl_client to be destroyed already. wl_display has a destroy signal - should we destroy the remaining wl_clients before or after emitting the signal? If the signal is used to tear down helper clients, then destroying wl_clients before emitting the signal may cause the helper clients to be immediately respawned. But, if like in this patch the destroy_signal is emitted first, then there is no way to tear any supporting framework down after clients are gone but before the wl_display is destroyed. That means that the framework needs to hook up to the wl_display destroy_signal and destroy all clients itself first. Therefore I think the wl_clients should be destroyed *before* emitting the wl_display destroy_signal (and clearly documented). This gives the compositor an opportunity to do any clean-up necessary before it calls wl_display_destroy(), then wl_display will destroy all the remaining wl_clients, and then wl_display destroy_signal gets called which will clean up anything that was left. Finally wl_display_destroy() frees the wl_display itself. That is the sequence that leaves the most freedom for the compositor to choose what it does in which step. Unfortunately, at least Weston will need fixing to work right with it. Btw. the order I propose is the opposite from how wl_client_destroy() destroyes the wl_resources. wl_client_destroy() emits the destroy signal first, and then destroys all the wl_resources. I think the difference is justified because wl_client_destroy() is usually called from within libwayland-server while wl_display_destroy() is always called by the compositor. The wl_client destroy loop in wl_display_destroy() cannot simply use wl_list_for_each_safe(), because callbacks from wl_client destroy signals might create new clients. I'd recommend moving the whole client list onto a temporary head first, initializing the original head, iterating through the temporary head, and finally logging a loud error message if the list at the original head is not empty. This avoids a potential crash or busy-loop-freeze if new clients are added when they shouldn't, while still pointing out the mistake in the compositor. It still feels quite risky, and the regressions at downstream might be nasty, even though it is only about shutdown. Automatically destroying wl_clients at wl_display_destroy() is an ABI change, no matter how I look at it. I think it breaks older Westons and we cannot have that. How about instead of destroying wl_clients in wl_display_destroy(), we add a wl_display_destroy_all_clients()? Actually, that could already be implemented with wl_display_get_client_list() from outside of libwayland-server, but it cannot do the temporary head trick to catch adding new clients at a bad time. I am still somewhat confused about what would be the best course of action, so other opinions would be valuable. I think wl_display_destroy_all_clients() would be the safest way forward, with a warning message from wl_display_destroy() if any wl_clients remain. Thanks, pq
pgp3HL_Fk4_dc.pgp
Description: OpenPGP digital signature
_______________________________________________ wayland-devel mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/wayland-devel
