On Wed, 16 Mar 2022 at 07:53, David Hildenbrand <[email protected]> wrote: > > On 16.03.22 05:04, Andrew Deason wrote: > > We have a thin wrapper around madvise, called qemu_madvise, which > > provides consistent behavior for the !CONFIG_MADVISE case, and works > > around some platform-specific quirks (some platforms only provide > > posix_madvise, and some don't offer all 'advise' types). This specific > > caller of madvise has never used it, tracing back to its original > > introduction in commit e0b266f01dd2 ("migration_completion: Take > > current state"). > > > > Call qemu_madvise here, to follow the same logic as all of our other > > madvise callers. This slightly changes the behavior for > > !CONFIG_MADVISE (EINVAL instead of ENOSYS, and a slightly different > > error message), but this is now more consistent with other callers > > that use qemu_madvise. > > > > Signed-off-by: Andrew Deason <[email protected]> > > --- > > Looking at the history of commits that touch this madvise() call, it > > doesn't _look_ like there's any reason to be directly calling madvise vs > > qemu_advise (I don't see anything mentioned), but I'm not sure. > > > > softmmu/physmem.c | 12 ++---------- > > 1 file changed, 2 insertions(+), 10 deletions(-) > > > > diff --git a/softmmu/physmem.c b/softmmu/physmem.c > > index 43ae70fbe2..900c692b5e 100644 > > --- a/softmmu/physmem.c > > +++ b/softmmu/physmem.c > > @@ -3584,40 +3584,32 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t > > start, size_t length) > > rb->idstr, start, length, ret); > > goto err; > > #endif > > } > > if (need_madvise) { > > /* For normal RAM this causes it to be unmapped, > > * for shared memory it causes the local mapping to disappear > > * and to fall back on the file contents (which we just > > * fallocate'd away). > > */ > > -#if defined(CONFIG_MADVISE) > > if (qemu_ram_is_shared(rb) && rb->fd < 0) { > > - ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE); > > + ret = qemu_madvise(host_startaddr, length, > > QEMU_MADV_REMOVE); > > } else { > > - ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED); > > + ret = qemu_madvise(host_startaddr, length, > > QEMU_MADV_DONTNEED); > > posix_madvise(QEMU_MADV_DONTNEED) has completely different semantics > then madvise() -- it's not a discard that we need here. > > So ram_block_discard_range() would now succeed in environments (BSD?) > where it's supposed to fail. > > So AFAIKs this isn't sane.
But CONFIG_MADVISE just means "host has madvise()"; it doesn't imply "this is a Linux madvise() with MADV_DONTNEED". Solaris madvise() doesn't seem to have MADV_DONTNEED at all; a quick look at the FreeBSD manpage suggests its madvise MADV_DONTNEED is identical to its posix_madvise MADV_DONTNEED. If we need "specifically Linux MADV_DONTNEED semantics" maybe we should define a QEMU_MADV_LINUX_DONTNEED which either (a) does the right thing or (b) fails, and use qemu_madvise() regardless. Certainly the current code is pretty fragile to being changed by people who don't understand the undocumented subtlety behind the use of a direct madvise() call here. -- PMM
