On Thu, 21 Aug 2025 at 19:18, Peter Xu <[email protected]> wrote: > I remember I provided a draft somewhere during the discussion, anyway.. I > redid it, and attached a formal patch below that will contain the removal > of the mr->container check (plus auto-detach when it happens). The hope is > this should work.. and comparing to the v8 of Akihiko's, it won't make MR's > own refcount any more complicated. > > If necessary, I can send a formal patch.
This patch seems to work, in that it fixes the memory-region related leaks I previously was seeing. Review comments below (which are only about the commit message and docs). I also can't remember the details of the previous discussions about these patches, so I'm reviewing the one below essentially from scratch. Apologies in advance if we end up going back around a conversation loop that we've already had... > Thanks, > > ===8<=== > From f985c54af3e276b175bcb02725a5fb996c3f5fe5 Mon Sep 17 00:00:00 2001 > From: Peter Xu <[email protected]> > Date: Thu, 21 Aug 2025 12:59:02 -0400 > Subject: [PATCH] memory: Fix leaks due to owner-shared MRs circular references > > Currently, QEMU refcounts the MR by always taking it from the owner. > > It should be non-issue if all the owners of MRs will properly detach all > the MRs from their parents by memory_region_del_subregion() when the owner > will be freed. However, it turns out many of the device emulations do not > do that properly. It might be a challenge to fix all of such occurances. I think that it's not really right to cast this as "some devices don't do this right". The pattern of "a device has a container MR C and some other MRs A, B... which it puts into C" is a legitimate one. If you do this then (with the current code in QEMU) there is *no* place where a device emulation can do a manual "remove A, B.. from the container C so the refcounts go down": the place where devices undo what they did in instance_init is instance_deinit, but we will never call instance_deinit because the refcount of the owning device never goes to 0. Further, even if we had some place where devices could somehow undo the "put A, B... in the container so the refcounts go down correctly", it is better API design to have the memory.c code automatically handle this situation. "This just works" is more reliable than "this works if you do cleanup step X", because it's impossible to have the bug where you forget to write the code to do the cleanup step. > Without fixing it, QEMU faces circular reference issue when an owner can > contain more than one MRs, then when they are linked within each other in > form of subregions, those links keep the owner alive forever, causing > memory leaks even if all the external refcounts are released for the owner. > > To fix that, teach memory API to stop refcount on MRs that share the same > owner because if they share the lifecycle of the owner, then they share the > same lifecycle between themselves. > > Meanwhile, allow auto-detachments of MRs during finalize() of MRs even > against its container, as long as they belong to the same owner. > > The latter is needed because now it's possible to have MR finalize() happen > in any order when they exactly share the same lifespan. In this case, we > should allow finalize() to happen in any order of either the parent or > child MR, however it should be guaranteed that the mr->container shares the > same owner of this MR to be finalized. > > Proper document this behavior in code. > > This patch is heavily based on the work done by Akihiko Odaki: > > https://lore.kernel.org/r/cafeaca8dv40fgsci76r4yep1p-sp_qjnrdd2ozpxjx5wrs0...@mail.gmail.com > > Signed-off-by: Peter Xu <[email protected]> > --- > docs/devel/memory.rst | 9 +++++++-- > system/memory.c | 45 ++++++++++++++++++++++++++++++++++--------- > 2 files changed, 43 insertions(+), 11 deletions(-) > > diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst > index 57fb2aec76..1367c7caf7 100644 > --- a/docs/devel/memory.rst > +++ b/docs/devel/memory.rst > @@ -158,8 +158,13 @@ ioeventfd) can be changed during the region lifecycle. > They take effect > as soon as the region is made visible. This can be immediately, later, > or never. > > -Destruction of a memory region happens automatically when the owner > -object dies. > +Destruction of a memory region happens automatically when the owner object > +dies. Note that the MRs under the same owner can be destroyed in any order > +when the owner object dies. It's because the MRs will share the same > +lifespan of the owner, no matter if its a container or child MR. It's > +suggested to always cleanly detach the MRs under the owner object when the > +owner object is going to be destroyed, however it is not required, as the > +memory core will automatically detach the links when MRs are destroyed. I think we should not say "we suggest you always cleanly detach the MRs": instead we should say "you can rely on this happening, so you don't need to write manual code to do it". The actual code changes look OK to me. thanks -- PMM
