On 9/29/23 13:17, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Sep 27, 2023 at 7:46 PM Laszlo Ersek <[email protected]> wrote:
>>
>> On 9/19/23 15:19, Laszlo Ersek wrote:
>>> The fw_cfg DMA write callback in ramfb prepares a new display surface in
>>> QEMU; this new surface is put to use ("swapped in") upon the next display
>>> update. At that time, the old surface (if any) is released.
>>>
>>> If the guest triggers the fw_cfg DMA write callback at least twice between
>>> two adjacent display updates, then the second callback (and further such
>>> callbacks) will leak the previously prepared (but not yet swapped in)
>>> display surface.
>>>
>>> The issue can be shown by:
>>>
>>> (1) starting QEMU with "-trace displaysurface_free", and
>>>
>>> (2) running the following program in the guest UEFI shell:
>>>
>>>> #include <Library/ShellCEntryLib.h>           // ShellAppMain()
>>>> #include <Library/UefiBootServicesTableLib.h> // gBS
>>>> #include <Protocol/GraphicsOutput.h>          // 
>>>> EFI_GRAPHICS_OUTPUT_PROTOCOL
>>>>
>>>> INTN
>>>> EFIAPI
>>>> ShellAppMain (
>>>>   IN UINTN   Argc,
>>>>   IN CHAR16  **Argv
>>>>   )
>>>> {
>>>>   EFI_STATUS                    Status;
>>>>   VOID                          *Interface;
>>>>   EFI_GRAPHICS_OUTPUT_PROTOCOL  *Gop;
>>>>   UINT32                        Mode;
>>>>
>>>>   Status = gBS->LocateProtocol (
>>>>                   &gEfiGraphicsOutputProtocolGuid,
>>>>                   NULL,
>>>>                   &Interface
>>>>                   );
>>>>   if (EFI_ERROR (Status)) {
>>>>     return 1;
>>>>   }
>>>>
>>>>   Gop = Interface;
>>>>
>>>>   Mode = 1;
>>>>   for ( ; ;) {
>>>>     Status = Gop->SetMode (Gop, Mode);
>>>>     if (EFI_ERROR (Status)) {
>>>>       break;
>>>>     }
>>>>
>>>>     Mode = 1 - Mode;
>>>>   }
>>>>
>>>>   return 1;
>>>> }
>>>
>>> The symptom is then that:
>>>
>>> - only one trace message appears periodically,
>>>
>>> - the time between adjacent messages keeps increasing -- implying that
>>>   some list structure (containing the leaked resources) keeps growing,
>>>
>>> - the "surface" pointer is ever different.
>>>
>>>> [email protected]:displaysurface_free surface=0x7f2fcc09a7c0
>>>> [email protected]:displaysurface_free surface=0x7f2fcc9dac10
>>>> [email protected]:displaysurface_free surface=0x7f2fcc441dd0
>>>> [email protected]:displaysurface_free surface=0x7f2fcc0363d0
>>>> [email protected]:displaysurface_free surface=0x7f2fcc413a80
>>>> [email protected]:displaysurface_free surface=0x7f2fcc09cd00
>>>> [email protected]:displaysurface_free surface=0x7f2fcc1395f0
>>>> [email protected]:displaysurface_free surface=0x7f2fcc1cae50
>>>> [email protected]:displaysurface_free surface=0x7f2fcc42fc50
>>>> [email protected]:displaysurface_free surface=0x7f2fcc45dcc0
>>>> [email protected]:displaysurface_free surface=0x7f2fcc70b9d0
>>>> [email protected]:displaysurface_free surface=0x7f2fcc82acc0
>>>> [email protected]:displaysurface_free surface=0x7f2fcc369c00
>>>> [email protected]:displaysurface_free surface=0x7f2fcc32b910
>>>> [email protected]:displaysurface_free surface=0x7f2fcc0d5a20
>>>> [email protected]:displaysurface_free surface=0x7f2fcc086c40
>>>> [email protected]:displaysurface_free surface=0x7f2fccc72020
>>>> [email protected]:displaysurface_free surface=0x7f2fcc185160
>>>> [email protected]:displaysurface_free surface=0x7f2fcc23a7e0
>>>> [email protected]:displaysurface_free surface=0x7f2fcc3ec870
>>>> [email protected]:displaysurface_free surface=0x7f2fcc634960
>>>> [email protected]:displaysurface_free surface=0x7f2fcc26b140
>>>> [email protected]:displaysurface_free surface=0x7f2fcc321700
>>>> [email protected]:displaysurface_free surface=0x7f2fccaad100
>>>
>>> We figured this wasn't a CVE-worthy problem, as only small amounts of
>>> memory were leaked (the framebuffer itself is mapped from guest RAM, QEMU
>>> only allocates administrative structures), plus libvirt restricts QEMU
>>> memory footprint anyway, thus the guest can only DoS itself.
>>>
>>> Plug the leak, by releasing the last prepared (not yet swapped in) display
>>> surface, if any, in the fw_cfg DMA write callback.
>>>
>>> Regarding the "reproducer", with the fix in place, the log is flooded with
>>> trace messages (one per fw_cfg write), *and* the trace message alternates
>>> between just two "surface" pointer values (i.e., nothing is leaked, the
>>> allocator flip-flops between two objects in effect).
>>>
>>> This issue appears to date back to the introducion of ramfb (995b30179bdc,
>>> "hw/display: add ramfb, a simple boot framebuffer living in guest ram",
>>> 2018-06-18).
>>>
>>> Cc: Gerd Hoffmann <[email protected]> (maintainer:ramfb)
>>> Cc: [email protected]
>>> Fixes: 995b30179bdc
>>> Signed-off-by: Laszlo Ersek <[email protected]>
>>> ---
>>>  hw/display/ramfb.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/hw/display/ramfb.c b/hw/display/ramfb.c
>>> index 79b9754a5820..c2b002d53480 100644
>>> --- a/hw/display/ramfb.c
>>> +++ b/hw/display/ramfb.c
>>> @@ -97,6 +97,7 @@ static void ramfb_fw_cfg_write(void *dev, off_t offset, 
>>> size_t len)
>>>
>>>      s->width = width;
>>>      s->height = height;
>>> +    qemu_free_displaysurface(s->ds);
>>>      s->ds = surface;
>>>  }
> 
> Reviewed-by: Marc-André Lureau <[email protected]>
> 
> Incidentally I found the same issue:
> https://patchew.org/QEMU/[email protected]/

Which patch is better?

I certainly didn't think of g_clear_pointer(); is that more idiomatic?

> 
> 
> fwiw, my migration support patch is still unreviewed:
> https://patchew.org/QEMU/[email protected]/
> 

I don't have a copy of that patch (not subscribed, sorry...), but how
simply you did it surprises me. I did expect to simulate an fw_cfg write
in post_load, but I thought we'd approach the device (for the sake of
including it in the migration stream) from the higher level device's
vmstate descriptors (dc->vmsd) that set up / depend on ramfb in the
first place. I didn't know that raw vmstate_register() was still accepted.

If it is, then, for that patch (with Gerd's comment addressed):

Acked-by: Laszlo Ersek <[email protected]>

BTW: can you please assign
<https://bugzilla.redhat.com/show_bug.cgi?id=1859424> to yourself and
link your patch into it? The reason we ended up duplicating work here is
that noone took RHBZ 1859424 before.

... Well, the ticket is RHEL-7478 in JIRA now, in fact. :/

Thanks!
Laszlo


Reply via email to