On 1/31/24 8:04 AM, David Hildenbrand wrote:
On 31.01.24 14:48, Mark Kanda wrote:
QEMU initializes preallocated backend memory as the objects are parsed from the command line. This is not optimal in some cases (e.g. memory spanning multiple NUMA nodes) because the memory objects are initialized in series.

Allow the initialization to occur in parallel (asynchronously). In order to ensure optimal thread placement, asynchronous initialization requires prealloc
context threads to be in use.

Signed-off-by: Mark Kanda <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
---
  backends/hostmem.c     |   8 ++-
  hw/virtio/virtio-mem.c |   4 +-
  include/qemu/osdep.h   |  18 +++++-
  system/vl.c            |   8 +++
  util/oslib-posix.c     | 131 +++++++++++++++++++++++++++++++----------
  util/oslib-win32.c     |   8 ++-
  6 files changed, 140 insertions(+), 37 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 30f69b2cb5..8f602dc86f 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -20,6 +20,7 @@
  #include "qom/object_interfaces.h"
  #include "qemu/mmap-alloc.h"
  #include "qemu/madvise.h"
+#include "hw/qdev-core.h"
    #ifdef CONFIG_NUMA
  #include <numaif.h>
@@ -235,9 +236,10 @@ static void host_memory_backend_set_prealloc(Object *obj, bool value,
          int fd = memory_region_get_fd(&backend->mr);
          void *ptr = memory_region_get_ram_ptr(&backend->mr);
          uint64_t sz = memory_region_size(&backend->mr);
+        bool async = !phase_check(PHASE_MACHINE_INITIALIZED);
            if (!qemu_prealloc_mem(fd, ptr, sz, backend->prealloc_threads,
-                               backend->prealloc_context, errp)) {
+                               backend->prealloc_context, async, errp)) {
              return;
          }

I think we will never trigger that case: we would have to set the propertly after the device was already initialized, which shouldn't happen.

So I guess we can simplify and drop that.


Will fix.

          backend->prealloc = true;


[...]

+++ b/include/qemu/osdep.h
@@ -680,6 +680,8 @@ typedef struct ThreadContext ThreadContext;
   * @area: start address of the are to preallocate
   * @sz: the size of the area to preallocate
   * @max_threads: maximum number of threads to use
+ * @tc: prealloc context threads pointer, NULL if not in use
+ * @async: request asynchronous preallocation, requires @tc
   * @errp: returns an error if this function fails
   *
   * Preallocate memory (populate/prefault page tables writable) for the virtual
@@ -687,10 +689,24 @@ typedef struct ThreadContext ThreadContext;
   * each page in the area was faulted in writable at least once, for example,
   * after allocating file blocks for mapped files.
   *
+ * When setting @async, allocation might be performed asynchronously.
+ * qemu_finish_async_mem_prealloc() must be called to finish any asynchronous
+ * preallocation.
+ *
   * Return: true on success, else false setting @errp with error.
   */
  bool qemu_prealloc_mem(int fd, char *area, size_t sz, int max_threads,
-                       ThreadContext *tc, Error **errp);
+                       ThreadContext *tc, bool async, Error **errp);
+
+/**
+ * qemu_finish_async_mem_prealloc:
+ * @errp: returns an error if this function fails
+ *
+ * Finish all outstanding asynchronous memory preallocation.
+ *
+ * Return: true on success, else false setting @errp with error.
+ */
+bool qemu_finish_async_mem_prealloc(Error **errp);

Suboptimal suggestion from my side, guess it woud be better to call this

"qemu_finish_async_prealloc_mem" to match "qemu_prealloc_mem"


Will fix.

    /**
   * qemu_get_pid_name:
diff --git a/system/vl.c b/system/vl.c
index 788d88ea03..290bb3232b 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2009,6 +2009,14 @@ static void qemu_create_late_backends(void)
        object_option_foreach_add(object_create_late);
  +    /*
+     * Wait for any outstanding memory prealloc from created memory
+     * backends to complete.
+     */
+    if (!qemu_finish_async_mem_prealloc(&error_fatal)) {
+        exit(1);
+    }
+

I'm wondering if we should have a new phase instead, like

PHASE_LATE_OBJECTS_CREATED.

and do here

phase_advance(PHASE_LATE_OBJECTS_CREATED);

and use that instead. Currently, there is a "gap" between both things. I don't think anything is actually broken right now (because any internal memory abckend wouldn't have a thread context), but it might be much cleaner and obvious that way.


OK. I'll call it 'PHASE_LATE_BACKENDS_CREATED' (to make it consistent with code comments/function name).

Apart from that LGTM!


Thanks/regards,
-Mark


Reply via email to