On 11/27/25 6:39 PM, Kevin Wolf wrote:
> Am 27.11.2025 um 15:31 hat Andrey Drobyshev geschrieben:
>> On 11/27/25 12:02 PM, Daniel P. Berrangé wrote:
>>> On Thu, Nov 27, 2025 at 10:56:12AM +0100, Kevin Wolf wrote:
>>>> Am 25.11.2025 um 15:21 hat [email protected] geschrieben:
>>>>> From: Andrey Drobyshev <[email protected]>
>>>>>
>>>>> Commit 772f86839f ("scripts/qemu-gdb: Support coroutine dumps in
>>>>> coredumps") introduced coroutine traces in coredumps using raw stack
>>>>> unwinding.  While this works, this approach does not allow to view the
>>>>> function arguments in the corresponding stack frames.
>>>>>
>>>>> As an alternative, we can obtain saved registers from the coroutine's
>>>>> jmpbuf, copy the original coredump file into a temporary file, patch the
>>>>> saved registers into the tmp coredump's struct elf_prstatus and execute
>>>>> another gdb subprocess to get backtrace from the patched temporary 
>>>>> coredump.
>>>>>
>>>>> While providing more detailed info, this alternative approach, however, is
>>>>> quite heavyweight as it takes significantly more time and disk space.
>>>>> So, instead of making it a new default, let's keep raw unwind the default
>>>>> behaviour, but add the '--detailed' option for 'qemu bt' and 'qemu 
>>>>> coroutine'
>>>>> command which would enforce the new behaviour.
>>>>> [...]
>>>>
>>>>> +def clone_coredump(source, target, set_regs):
>>>>> +    shutil.copyfile(source, target)
>>>>> +    write_regs_to_coredump(target, set_regs)
>>>>> +
>>>>> +def dump_backtrace_patched(regs):
>>>>> +    files = gdb.execute('info files', False, True).split('\n')
>>>>> +    executable = re.match('^Symbols from "(.*)".$', files[0]).group(1)
>>>>> +    dump = re.search("`(.*)'", files[2]).group(1)
>>>>> +
>>>>> +    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as f:
>>>>> +        tmpcore = f.name
>>>>> +
>>>>> +    clone_coredump(dump, tmpcore, regs)
>>>>
>>>> I think this is what makes it so heavy, right? Coredumps can be quite
>>>> large and /tmp is probably a different filesystem, so you end up really
>>>> copying the full size of the coredump around.
>>>
>>> On my system /tmp is  tmpfs, so this is actually bringing the whole
>>> coredump into RAM which is not a sensible approach.
>>>
>>>> Wouldn't it be better in the general case if we could just do a reflink
>>>> copy of the coredump and then do only very few writes for updating the
>>>> register values? Then the overhead should actually be quite negligible
>>>> both in terms of time and disk space.
>>>
>>
>> That's correct, copying the file to /tmp takes most of the time with
>> this approach.
>>
>> As for reflink copy, this might've been a great solution.  However, it
>> would largely depend on the FS used.  E.g. in my system coredumpctl
>> places uncompressed coredump at /var/tmp, which is mounted as ext4.  And
>> in this case:
>>
>> # cp --reflink /var/tmp/coredump-MQCZQc /root
>> cp: failed to clone '/root/coredump-MQCZQc' from
>> '/var/tmp/coredump-MQCZQc': Invalid cross-device link
>>
>> # cp --reflink /var/tmp/coredump-MQCZQc /var/tmp/coredump.ref
>> cp: failed to clone '/var/tmp/coredump.ref' from
>> '/var/tmp/coredump-MQCZQc': Operation not supported
>>
>> Apparently, ext4 doesn't support reflink copy. xfs and btrfs do.  But I
>> guess our implementation better be FS-agnostic.
> 
> Yes, we might still need a slow copy fallback for those filesystems,
> i.e. essentially a 'cp --reflink=auto'. For myself, coredumps will
> generally live on XFS, so I would benefit from creating that copy in the
> same filesystem where the coredump lives, and for you it shouldn't hurt
> at least.
> 
> Another thought... it's a bit crazy, but... we're QEMU, we have our own
> tools for this. We could create a qcow2 overlay for the coredump and
> export it using FUSE! :-D (Probably not very practical because you need
> the right paths for the binaries, but I had to mention it.)
> 
> Kevin
> 

We can surely add reflink copying as a fast path option which we try
first.  That's cheap to implement.  The real issue is designing a
sensible fallback approach.

As for creating an overlay... That's an interesting option but it would
require everybody who wants to use this stuff configure their QEMU build
with --enable-fuse.  We, for instance, don't have it enabled in our
builds, as I'm sure many others.

Of course we can think of an NBD export for such an overlay instead of
FUSE.  But it'll then require root user to write to /dev/nbd0.  Also not
very acceptable.

Quick overlayfs mount with lowerdir=/var/tmp could also solve this.  But
again, root is required.  Not good.

So the most robust option, I guess, is the one suggested by Daniel:
copying some kind of minimal viable coredump part/range instead of the
whole file, which is just enough for producing valid backtrace.  The
only thing left is figuring out which part to copy.  That might require
some tricky ELF structure parsing.

Andrey

Reply via email to