On 2/11/25 23:35, Peter Xu wrote:
On Tue, Feb 11, 2025 at 09:27:04PM +, “William Roche wrote:
From: William Roche
Here is a very simplified version of my fix only dealing with the
recovery of huge pages on VM reset.
---
This set of patches fixes an existing bug with hardware memory
From: William Roche
Here is a very simplified version of my fix only dealing with the
recovery of huge pages on VM reset.
---
This set of patches fixes an existing bug with hardware memory errors
impacting hugetlbfs memory backed VMs and its recovery on VM reset.
When using hugetlbfs large
From: William Roche
The list of hwpoison pages used to remap the memory on reset
is based on the backend real page size.
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
hugetlb page; hugetlb pages cannot be partially mapped.
Signed-off-by: William Roche
Co-developed-by: David
From: William Roche
Generate an x86 similar error injection message on ras enabled ARM
platforms.
ARM qemu only deals with action required memory errors signaled with
SIGBUS/BUS_MCEERR_AR, and will report a message on every memory error
relayed to the VM. A message like:
Guest Memory Error at
From: William Roche
Repair poisoned memory location(s), calling ram_block_discard_range():
punching a hole in the backend file when necessary and regenerating
a usable memory.
If the kernel doesn't support the madvise calls used by this function
and we are dealing with anonymous memory,
On 2/10/25 17:48, Peter Xu wrote:
On Fri, Feb 07, 2025 at 07:02:22PM +0100, William Roche wrote:
[...]
So the main reason is a KVM "weakness" with kvm_send_hwpoison_signal(), and
the second reason is to have richer error messages.
This seems true, and I also remember something whe
On 2/5/25 18:07, Peter Xu wrote:
On Wed, Feb 05, 2025 at 05:27:13PM +0100, William Roche wrote:
[...]
The HMP command "info ramblock" is implemented with the ram_block_format()
function which returns a message buffer built with a string for each
ramblock (protected by the RCU_READ_
On 2/4/25 18:01, Peter Xu wrote:
On Sat, Feb 01, 2025 at 09:57:23AM +, “William Roche wrote:
From: William Roche
In case of a large page impacted by a memory error, provide an
information about the impacted large page before the memory
error injection message.
This message would also
On 2/4/25 21:16, Peter Xu wrote:
On Tue, Feb 04, 2025 at 07:55:52PM +0100, David Hildenbrand wrote:
Ah, and now I remember where these 3 patches originate from: virtio-mem
handling.
For virtio-mem I want to register also a remap handler, for example, to
perform the custom preallocation handling
On 2/4/25 18:09, Peter Xu wrote:
On Sat, Feb 01, 2025 at 09:57:22AM +, “William Roche wrote:
From: William Roche
Repair poisoned memory location(s), calling ram_block_discard_range():
punching a hole in the backend file when necessary and regenerating
a usable memory.
If the kernel
From: David Hildenbrand
Notify registered listeners about the remap at the end of
qemu_ram_remap() so e.g., a memory backend can re-apply its
settings correctly.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
hw/core/numa.c | 11 +++
include/exec/ramlist.h
From: William Roche
Let's register a RAM block notifier and react on remap notifications.
Simply re-apply the settings. Exit if something goes wrong.
Merging and dump settings are handled by the remap notification
in addition to memory policy and preallocation.
Co-developed-by:
From: William Roche
The list of hwpoison pages used to remap the memory on reset
is based on the backend real page size.
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
hugetlb page; hugetlb pages cannot be partially mapped.
Signed-off-by: William Roche
Co-developed-by: David
From: William Roche
In case of a large page impacted by a memory error, provide an
information about the impacted large page before the memory
error injection message.
This message would also appear on ras enabled ARM platforms, with
the introduction of an x86 similar error injection message
From: William Roche
Hello David,
Here is the version with the small nits corrected.
And the 'Acked-by' entries you gave me for patch 1 and 2.
---
This set of patches fixes several problems with hardware memory errors
impacting hugetlbfs memory backed VMs and the generic memory reco
From: David Hildenbrand
We want to reuse the functionality when remapping RAM.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 155 -
1 file changed, 82 insertions(+), 73 deletions(-)
diff --git a/backends
From: William Roche
Repair poisoned memory location(s), calling ram_block_discard_range():
punching a hole in the backend file when necessary and regenerating
a usable memory.
If the kernel doesn't support the madvise calls used by this function
and we are dealing with anonymous memory,
On 1/30/25 18:02, David Hildenbrand wrote:
On 27.01.25 22:31, “William Roche wrote:
From: William Roche
In case of a large page impacted by a memory error, provide an
information about the impacted large page before the memory
error injection message.
This message would also appear on ras
From: William Roche
Let's register a RAM block notifier and react on remap notifications.
Simply re-apply the settings. Exit if something goes wrong.
Merging and dump settings are handled by the remap notification
in addition to memory policy and preallocation.
Co-developed-by:
From: David Hildenbrand
Notify registered listeners about the remap at the end of
qemu_ram_remap() so e.g., a memory backend can re-apply its
settings correctly.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
hw/core/numa.c | 11 +++
include/exec/ramlist.h
From: William Roche
In case of a large page impacted by a memory error, provide an
information about the impacted large page before the memory
error injection message.
This message would also appear on ras enabled ARM platforms, with
the introduction of an x86 similar error injection message
From: David Hildenbrand
We want to reuse the functionality when remapping RAM.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 155 -
1 file changed, 82 insertions(+), 73 deletions(-)
diff --git a/backends
From: William Roche
Repair poisoned memory location(s), calling ram_block_discard_range():
punching a hole in the backend file when necessary and regenerating
a usable memory.
If the kernel doesn't support the madvise calls used by this function
and we are dealing with anonymous memory,
From: William Roche
The list of hwpoison pages used to remap the memory on reset
is based on the backend real page size.
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
hugetlb page; hugetlb pages cannot be partially mapped.
Co-developed-by: David Hildenbrand
Signed-off-by
From: William Roche
Hello David,
I'm back on this topic.
---
This set of patches fixes several problems with hardware memory errors
impacting hugetlbfs memory backed VMs and the generic memory recovery
on VM reset.
When using hugetlbfs large pages, any large page location being impacted
On 1/14/25 15:11, David Hildenbrand wrote:
On 10.01.25 22:14, “William Roche wrote:
From: David Hildenbrand
You can make yourself the author and just make me a Co-developed-by here.
LGTM!
Ok done.
Thanks.
On 1/14/25 15:12, David Hildenbrand wrote:
On 10.01.25 22:13, “William Roche wrote:
From: William Roche
Hello David,
I'm keeping the description of the patch set you already reviewed:
Hi,
one request, can you send it out next time (v6) *not* as reply to the
previous thread, but just
On 1/14/25 15:09, David Hildenbrand wrote:
On 10.01.25 22:14, “William Roche wrote:
From: William Roche
In case of a large page impacted by a memory error, enhance
the existing Qemu error message which indicates that the error
is injected in the VM, adding "on lost large page SIZE
On 1/14/25 15:07, David Hildenbrand wrote:
On 10.01.25 22:14, “William Roche wrote:
From: William Roche
Repair poisoned memory location(s), calling ram_block_discard_range():
punching a hole in the backend file when necessary and regenerating
a usable memory.
If the kernel doesn't suppor
On 1/14/25 15:00, David Hildenbrand wrote:
If we can get the current set of fixes integrated, I'll submit another
fix proposal to take the fd_offset into account in a second time. (Not
enlarging the current set)
But here is what I'm thinking about. That we can discuss later if you
want:
@@ -3
On 1/14/25 15:02, David Hildenbrand wrote:
On 10.01.25 22:14, “William Roche wrote:
From: William Roche
The list of hwpoison pages used to remap the memory on reset
is based on the backend real page size. When dealing with
hugepages, we create a single entry for the entire page.
To correctly
From: William Roche
Working on the poisoned memory recovery mechanisms with David
Hildenbrand, it appeared that the file hole punching done with
the memory discard functions are missing the file offset value
fd_offset to correctly modify the right file location.
Note that guest_memfd would not
From: William Roche
Punching a hole in a file with fallocate needs to take into account the
fd_offset value for a correct file location.
But guest_memfd internal use doesn't currently consider fd_offset.
Fixes: 4b870dc4d0c0 ("hostmem-file: add offset option")
Signed-off-by
On 1/22/25 09:01, David Hildenbrand wrote:
On 21.01.25 23:54, “William Roche wrote:
From: William Roche
[...]
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -3655,6 +3655,7 @@ int ram_block_discard_range(RAMBlock *rb,
uint64_t start, size_t length)
need_madvise = (rb->page_s
From: William Roche
Punching a hole in a file with fallocate needs to take into account the
fd_offset value for a correct file location.
But guest_memfd internal use doesn't currently consider fd_offset.
Fixes: 4b870dc4d0c0 ("hostmem-file: add offset option")
Signed-off-by
From: William Roche
Working on the poisoned memory recovery mechanisms with David
Hildenbrand, it appeared that the file hole punching done with
the memory discard functions are missing the file offset value
fd_offset to correctly modify the right file location.
Note that guest_memfd would not
Thank you Peter and David for your feedback.
On 1/21/25 19:25, David Hildenbrand wrote:
On 21.01.25 19:17, Peter Xu wrote:
On Tue, Jan 21, 2025 at 05:59:56PM +, “William Roche wrote:
From: William Roche
Punching a hole in a file with fallocate needs to take into account the
fd_offset
From: William Roche
Punching a hole in a file with fallocate needs to take into account the
fd_offset value for a correct file location.
Fixes: 4b870dc4d0c0 ("hostmem-file: add offset option")
Signed-off-by: William Roche
---
system/physmem.c | 14 --
1 file changed, 8
From: William Roche
Working on the poisoned memory recovery mechanisms with David
Hildenbrand, it appeared that the file hole punching done with
the memory discard functions are missing the file offset value
fd_offset to correctly modify the right file location.
I'm not sure that guest_
David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 34 ++
include/system/hostmem.h | 1 +
system/physmem.c | 4
3 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/backends/hostmem.c b/backends/hostmem.c
index 46d80
From: David Hildenbrand
Notify registered listeners about the remap at the end of
qemu_ram_remap() so e.g., a memory backend can re-apply its
settings correctly.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
hw/core/numa.c | 11 +++
include/exec/ramlist.h
From: William Roche
In case of a large page impacted by a memory error, enhance
the existing Qemu error message which indicates that the error
is injected in the VM, adding "on lost large page SIZE@ADDR".
Include also a similar message to the ARM platform.
In the case of a large pag
From: David Hildenbrand
We want to reuse the functionality when remapping RAM.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 155 -
1 file changed, 82 insertions(+), 73 deletions(-)
diff --git a/backends
From: William Roche
Repair poisoned memory location(s), calling ram_block_discard_range():
punching a hole in the backend file when necessary and regenerating
a usable memory.
If the kernel doesn't support the madvise calls used by this function
and we are dealing with anonymous memory,
From: William Roche
Hello David,
I'm keeping the description of the patch set you already reviewed:
---
This set of patches fixes several problems with hardware memory errors
impacting hugetlbfs memory backed VMs and the generic memory recovery
on VM reset.
When using hugetlbfs large
From: William Roche
The list of hwpoison pages used to remap the memory on reset
is based on the backend real page size. When dealing with
hugepages, we create a single entry for the entire page.
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
hugetlb page; hugetlb pages cannot
On 1/8/25 22:34, David Hildenbrand wrote:
On 14.12.24 14:45, “William Roche wrote:
From: William Roche
Subject should likely start with "system/physmem:".
Maybe
"system/physmem: handle hugetlb correctly in qemu_ram_remap()"
I updated the commit title
The list of
On 1/8/25 22:58, David Hildenbrand wrote:
On 14.12.24 14:45, “William Roche wrote:
From: David Hildenbrand
We want to reuse the functionality when remapping or resizing RAM.
We should drop the "or resizing of RAM." part, as that does no longer
apply.
Commit message corrected.
On 1/8/25 22:53, David Hildenbrand wrote:
On 14.12.24 14:45, “William Roche wrote:
From: William Roche
Merging and dump settings are handled by the remap notification
in addition to memory policy and preallocation.
Signed-off-by: William Roche
---
system/physmem.c | 2 --
1 file changed
On 1/8/25 22:51, David Hildenbrand wrote:
On 14.12.24 14:45, “William Roche wrote:
From: David Hildenbrand
Let's register a RAM block notifier and react on remap notifications.
Simply re-apply the settings. Exit if something goes wrong.
Note: qemu_ram_remap() will not remap when RAM_PRE
On 1/8/25 22:44, David Hildenbrand wrote:
On 14.12.24 14:45, “William Roche wrote:
+/* Try to simply remap the given location */
+static void qemu_ram_remap_mmap(RAMBlock *block, void* vaddr, size_t
size,
+ ram_addr_t offset)
Can you make the parameters match
On 1/8/25 22:22, David Hildenbrand wrote:
On 14.12.24 14:45, “William Roche wrote:
From: William Roche
Hello David,
Hi!
Let me start reviewing today a bit (it's already late, and I'll continue
tomorrow.
Here is an new version of our code and an updated description of the
From: William Roche
Merging and dump settings are handled by the remap notification
in addition to memory policy and preallocation.
Signed-off-by: William Roche
---
system/physmem.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/system/physmem.c b/system/physmem.c
index 9fc74a5699
From: William Roche
Repair poisoned memory location(s), calling ram_block_discard_range():
punching a hole in the backend file when necessary and regenerating
a usable memory.
If the kernel doesn't support the madvise calls used by this function
and we are dealing with anonymous memory,
ff-by: David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 34 ++
include/sysemu/hostmem.h | 1 +
2 files changed, 35 insertions(+)
diff --git a/backends/hostmem.c b/backends/hostmem.c
index bf85d716e5..863f6da11d 100644
--- a/bac
From: David Hildenbrand
Notify registered listeners about the remap at the end of
qemu_ram_remap() so e.g., a memory backend can re-apply its
settings correctly.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
hw/core/numa.c | 11 +++
include/exec/ramlist.h
From: David Hildenbrand
We want to reuse the functionality when remapping or resizing RAM.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 155 -
1 file changed, 82 insertions(+), 73 deletions(-)
diff --git a
From: William Roche
In case of a large page impacted by a memory error, enhance
the existing Qemu error message which indicates that the error
is injected in the VM, adding "on lost large page SIZE@ADDR".
Include also a similar message to the ARM platform.
In the case of a large pag
From: William Roche
Hello David,
Here is an new version of our code and an updated description of the
patch set:
---
This set of patches fixes several problems with hardware memory errors
impacting hugetlbfs memory backed VMs and the generic memory recovery
on VM reset.
When using hugetlbfs
From: William Roche
The list of hwpoison pages used to remap the memory on reset
is based on the backend real page size. When dealing with
hugepages, we create a single entry for the entire page.
Co-developed-by: David Hildenbrand
Signed-off-by: William Roche
---
accel/kvm/kvm-all.c
On 12/3/24 16:00, David Hildenbrand wrote:
On 03.12.24 15:39, William Roche wrote:
[...]
Our new Qemu code is testing first the fallocate+MADV_DONTNEED procedure
for standard sized pages (in ram_block_discard_range()) and only folds
back to the mmap() use if it fails. So maybe my proposal to
On 12/3/24 15:08, David Hildenbrand wrote:
[...]
Let me take a look at your tool below if I can find an explanation of
what is happening, because it's weird :)
[...]
At the end of this email, I included the source code of a simplistic
test case that shows that the page is replaced in the c
On 12/2/24 17:00, David Hildenbrand wrote:
On 02.12.24 16:41, William Roche wrote:
Hello David,
Hi,
sorry for reviewing yet, I was rather sick the last 1.5 weeks.
I hope you get well soon!
I've finally tested many page mapping possibilities and tried to
identify the error inje
Hello David,
I've finally tested many page mapping possibilities and tried to
identify the error injection reaction on these pages to see if mmap()
can be used to recover the impacted area.
I'm using the latest upstream kernel I have for that:
6.12.0-rc7.master.20241117.ol9.x86_64
But I also g
From: David Hildenbrand
Notify registered listeners about the remap at the end of
qemu_ram_remap() so e.g., a memory backend can re-apply its
settings correctly.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
hw/core/numa.c | 11 +++
include/exec/ramlist.h
From: William Roche
In case of a large page impacted by a memory error, complete
the existing Qemu error message to indicate that the error is
injected in the VM. Also include a simlar message to the ARM
platform.
Only in the case of a large page impacted, we now report:
...Memory Error at QEMU
ff-by: David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 34 ++
include/sysemu/hostmem.h | 1 +
2 files changed, 35 insertions(+)
diff --git a/backends/hostmem.c b/backends/hostmem.c
index bf85d716e5..863f6da11d 100644
--- a/bac
From: William Roche
Merging and dump settings are handled by the remap notification
in addition to memory policy and preallocation.
If preallocation is set on a memory block, qemu_prealloc_mem()
call is needed also after a ram_block_discard_range() use for
this block.
Signed-off-by: William
From: David Hildenbrand
We want to reuse the functionality when remapping or resizing RAM.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 155 -
1 file changed, 82 insertions(+), 73 deletions(-)
diff --git a
From: William Roche
Repair memory locations, calling ram_block_discard_range(),
punching a hole in the backend file when necessary and regenerate
a usable memory.
Fall back to unmap/remap the memory location(s) if the kernel doesn't
support the madvise calls used by ram_block_discard_
From: William Roche
Hi David,
Here is an new version of our code, but I still need to double check
the mmap behavior in case of a memory error impact on:
- a clean page of an empty file or populated file
- already mapped using MAP_SHARED or MAP_PRIVATE
to see if mmap() can recover the area or
From: William Roche
The list of hwpoison pages used to remap the memory on reset
is based on the backend real page size. When dealing with
hugepages, we create a single entry for the entire page.
Co-developed-by: David Hildenbrand
Signed-off-by: William Roche
---
accel/kvm/kvm-all.c
:
On 12.11.24 19:17, William Roche wrote:
On 11/12/24 12:13, David Hildenbrand wrote:
On 07.11.24 11:21, “William Roche wrote:
From: William Roche
When an entire large page is impacted by an error (hugetlbfs case),
report better the size and location of this large memory hole, so
give a wa
On 11/12/24 12:13, David Hildenbrand wrote:
On 07.11.24 11:21, “William Roche wrote:
From: William Roche
When an entire large page is impacted by an error (hugetlbfs case),
report better the size and location of this large memory hole, so
give a warning message when this page is first hit
On 11/12/24 12:07, David Hildenbrand wrote:
On 07.11.24 11:21, “William Roche wrote:
From: William Roche
We take into account the recorded page sizes to repair the
memory locations, calling ram_block_discard_range() to punch a hole
in the backend file when necessary and regenerate a usable
On 11/12/24 11:30, David Hildenbrand wrote:
On 07.11.24 11:21, “William Roche wrote:
From: William Roche
When a memory page is added to the hwpoison_page_list, include
the page size information. This size is the backend real page
size. To better deal with hugepages, we create a single entry
On 11/12/24 14:45, David Hildenbrand wrote:
On 07.11.24 11:21, “William Roche wrote:
From: David Hildenbrand
Let's register a RAM block notifier and react on remap notifications.
Simply re-apply the settings. Warn only when something goes wrong.
Note: qemu_ram_remap() will not remap
From: William Roche
When a memory page is added to the hwpoison_page_list, include
the page size information. This size is the backend real page
size. To better deal with hugepages, we create a single entry
for the entire page.
Signed-off-by: William Roche
---
accel/kvm/kvm-all.c | 8
From: David Hildenbrand
We want to reuse the functionality when remapping or resizing RAM.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 155 -
1 file changed, 82 insertions(+), 73 deletions(-)
diff --git a
igned-off-by: David Hildenbrand
Signed-off-by: William Roche
---
backends/hostmem.c | 29 +
include/sysemu/hostmem.h | 1 +
2 files changed, 30 insertions(+)
diff --git a/backends/hostmem.c b/backends/hostmem.c
index bf85d716e5..fbd8708664 100644
--- a/bac
From: William Roche
We take into account the recorded page sizes to repair the
memory locations, calling ram_block_discard_range() to punch a hole
in the backend file when necessary and regenerate a usable memory.
Fall back to unmap/remap the memory location(s) if the kernel doesn't
suppor
From: William Roche
Merging and dump settings are handled by the remap notification
in addition to memory policy and preallocation.
If preallocation is set on a memory block, qemu_prealloc_mem()
call is needed also after a ram_block_discard_range() use for
this block.
Signed-off-by: William
From: William Roche
When an entire large page is impacted by an error (hugetlbfs case),
report better the size and location of this large memory hole, so
give a warning message when this page is first hit:
Memory error: Loosing a large page (size: X) at QEMU addr Y and GUEST addr Z
Signed-off
From: David Hildenbrand
Notify registered listeners about the remap at the end of
qemu_ram_remap() so e.g., a memory backend can re-apply its
settings correctly.
Signed-off-by: David Hildenbrand
Signed-off-by: William Roche
---
hw/core/numa.c | 11 +++
include/exec/ramlist.h
From: William Roche
Hi David,
Here is an updated description of the patch set:
---
This set of patches fixes several problems with hardware memory errors
impacting hugetlbfs memory backed VMs. When using hugetlbfs large
pages, any large page location being impacted by an HW memory error
On 10/28/24 17:42, David Hildenbrand wrote:
On 26.10.24 01:27, William Roche wrote:
On 10/23/24 09:28, David Hildenbrand wrote:
On 22.10.24 23:35, “William Roche wrote:
From: William Roche
Add the page size information to the hwpoison_page_list elements.
As the kernel doesn't always r
On 10/28/24 18:01, David Hildenbrand wrote:
On 26.10.24 01:27, William Roche wrote:
On 10/23/24 09:30, David Hildenbrand wrote:
On 22.10.24 23:35, “William Roche wrote:
From: William Roche
When the VM reboots, a memory reset is performed calling
qemu_ram_remap() on all hwpoisoned pages
On 10/23/24 09:28, David Hildenbrand wrote:
On 22.10.24 23:35, “William Roche wrote:
From: William Roche
Add the page size information to the hwpoison_page_list elements.
As the kernel doesn't always report the actual poisoned page size,
we adjust this size from the backend real page siz
On 10/23/24 09:30, David Hildenbrand wrote:
On 22.10.24 23:35, “William Roche wrote:
From: William Roche
When the VM reboots, a memory reset is performed calling
qemu_ram_remap() on all hwpoisoned pages.
While we take into account the recorded page sizes to repair the
memory locations, a
On 10/23/24 09:28, David Hildenbrand wrote:
On 22.10.24 23:35, “William Roche wrote:
From: William Roche
Add the page size information to the hwpoison_page_list elements.
As the kernel doesn't always report the actual poisoned page size,
we adjust this size from the backend real page
From: William Roche
On HW memory error, we need to report better what the impact of this
error is. So when an entire large page is impacted by an error (like the
hugetlbfs case), we give a warning message when this page is first hit:
Memory error: Loosing a large page (size: X) at QEMU addr Y
From: William Roche
When the VM reboots, a memory reset is performed calling
qemu_ram_remap() on all hwpoisoned pages.
While we take into account the recorded page sizes to repair the
memory locations, a large page also needs to punch a hole in the
backend file to regenerate a usable memory
From: William Roche
This set of patches fixes several problems with hardware memory errors
impacting hugetlbfs memory backed VMs. When using hugetlbfs large
pages, any large page location being impacted by an HW memory error
results in poisoning the entire page, suddenly making a large chunk of
From: William Roche
The SIGBUS signal siginfo reporting a HW memory error
provides a si_addr_lsb field with an indication of the
impacted memory page size.
This information should be used to track the hwpoisoned
page sizes.
Signed-off-by: William Roche
---
accel/kvm/kvm-all.c| 6
From: William Roche
Add the page size information to the hwpoison_page_list elements.
As the kernel doesn't always report the actual poisoned page size,
we adjust this size from the backend real page size.
We take into account the recorded page size to adjust the size
and location of the m
On 10/9/24 17:45, Peter Xu wrote:
On Thu, Sep 19, 2024 at 06:52:37PM +0200, William Roche wrote:
Hello David,
I hope my last week email answered your interrogations about:
- retrieving the valid data from the lost hugepage
- the need of smaller pages to replace a failed large page
Hello David,
I hope my last week email answered your interrogations about:
- retrieving the valid data from the lost hugepage
- the need of smaller pages to replace a failed large page
- the interaction of memory error and VM migration
- the non-symmetrical access to a poisoned me
On 9/12/24 00:07, David Hildenbrand wrote:
Hi again,
This is a Qemu RFC to introduce the possibility to deal with hardware
memory errors impacting hugetlbfs memory backed VMs. When using
hugetlbfs large pages, any large page location being impacted by an
HW memory error results in poisoning th
On 9/10/24 13:36, David Hildenbrand wrote:
On 10.09.24 12:02, “William Roche wrote:
From: William Roche
Hi,
Apologies for the noise; resending as I missed CC'ing the maintainers
of the
changed files
Hello,
This is a Qemu RFC to introduce the possibility to deal with hardware
m
From: William Roche
Apologies for the noise; resending as I missed CC'ing the maintainers of the
changed files
Hello,
This is a Qemu RFC to introduce the possibility to deal with hardware
memory errors impacting hugetlbfs memory backed VMs. When using
hugetlbfs large pages, any large
1 - 100 of 140 matches
Mail list logo