On 2025/08/27 6:57, Peter Xu wrote:
On Tue, Aug 26, 2025 at 06:45:43PM +0100, Peter Maydell wrote:
On Thu, 21 Aug 2025 at 19:18, Peter Xu <[email protected]> wrote:
I remember I provided a draft somewhere during the discussion, anyway.. I
redid it, and attached a formal patch below that will contain the removal
of the mr->container check (plus auto-detach when it happens).  The hope is
this should work.. and comparing to the v8 of Akihiko's, it won't make MR's
own refcount any more complicated.

If necessary, I can send a formal patch.

This patch seems to work, in that it fixes the memory-region
related leaks I previously was seeing. Review comments below
(which are only about the commit message and docs).

I also can't remember the details of the previous discussions about
these patches, so I'm reviewing the one below essentially from
scratch. Apologies in advance if we end up going back around
a conversation loop that we've already had...

Thanks,

===8<===
 From f985c54af3e276b175bcb02725a5fb996c3f5fe5 Mon Sep 17 00:00:00 2001
From: Peter Xu <[email protected]>
Date: Thu, 21 Aug 2025 12:59:02 -0400
Subject: [PATCH] memory: Fix leaks due to owner-shared MRs circular references

Currently, QEMU refcounts the MR by always taking it from the owner.

It should be non-issue if all the owners of MRs will properly detach all
the MRs from their parents by memory_region_del_subregion() when the owner
will be freed.  However, it turns out many of the device emulations do not
do that properly.  It might be a challenge to fix all of such occurances.

I think that it's not really right to cast this as "some devices
don't do this right". The pattern of "a device has a container MR C
and some other MRs A, B... which it puts into C" is a legitimate one.
If you do this then (with the current code in QEMU) there is *no*
place where a device emulation can do a manual "remove A, B.. from
the container C so the refcounts go down": the place where devices
undo what they did in instance_init is instance_deinit, but we
will never call instance_deinit because the refcount of the owning
device never goes to 0.

It should still be able to reach zero if we skip the refcount of internal
MR subregions like what this patch does.

Said that, I think you're right..  the problem is we have object_deinit()
after object_property_del_all() (in object_finalize()), which means
memory_region_finalize() will be invoked before object_deinit()..  Then it
looks wrong now to delete subregions in a deinit() even if reachable,
because the MRs should have been freed..


Further, even if we had some place where devices could somehow
undo the "put A, B... in the container so the refcounts go down
correctly", it is better API design to have the memory.c code
automatically handle this situation. "This just works" is more
reliable than "this works if you do cleanup step X", because it's
impossible to have the bug where you forget to write the code to
do the cleanup step.

Fair enough.


Without fixing it, QEMU faces circular reference issue when an owner can
contain more than one MRs, then when they are linked within each other in
form of subregions, those links keep the owner alive forever, causing
memory leaks even if all the external refcounts are released for the owner.

To fix that, teach memory API to stop refcount on MRs that share the same
owner because if they share the lifecycle of the owner, then they share the
same lifecycle between themselves.

Meanwhile, allow auto-detachments of MRs during finalize() of MRs even
against its container, as long as they belong to the same owner.

The latter is needed because now it's possible to have MR finalize() happen
in any order when they exactly share the same lifespan.  In this case, we
should allow finalize() to happen in any order of either the parent or
child MR, however it should be guaranteed that the mr->container shares the
same owner of this MR to be finalized.

Proper document this behavior in code.

This patch is heavily based on the work done by Akihiko Odaki:

https://lore.kernel.org/r/cafeaca8dv40fgsci76r4yep1p-sp_qjnrdd2ozpxjx5wrs0...@mail.gmail.com

Signed-off-by: Peter Xu <[email protected]>
---
  docs/devel/memory.rst |  9 +++++++--
  system/memory.c       | 45 ++++++++++++++++++++++++++++++++++---------
  2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst
index 57fb2aec76..1367c7caf7 100644
--- a/docs/devel/memory.rst
+++ b/docs/devel/memory.rst
@@ -158,8 +158,13 @@ ioeventfd) can be changed during the region lifecycle.  
They take effect
  as soon as the region is made visible.  This can be immediately, later,
  or never.

-Destruction of a memory region happens automatically when the owner
-object dies.
+Destruction of a memory region happens automatically when the owner object
+dies.  Note that the MRs under the same owner can be destroyed in any order
+when the owner object dies.  It's because the MRs will share the same
+lifespan of the owner, no matter if its a container or child MR.  It's
+suggested to always cleanly detach the MRs under the owner object when the
+owner object is going to be destroyed, however it is not required, as the
+memory core will automatically detach the links when MRs are destroyed.

I think we should not say "we suggest you always cleanly detach the MRs":
instead we should say "you can rely on this happening, so you don't
need to write manual code to do it".

The actual code changes look OK to me.

I'll send the patch out officially for review, with above point taken.

Akihiko, let me know anytime when you want to either add your SoB or take
over the ownership of the patch.  I'll be OK with it.

I'm not in favor of the change. There was a long discussion in the past, but I think the following email is the most comprehensive description of my point and includes comparison between the two approaches:
https://lore.kernel.org/qemu-devel/[email protected]/

So please check the email and reply it.

Regards,
Akihiko Odaki

Reply via email to