Add guest-guide documentation for Xen’s memory-claim mechanism and the two hypercalls for it to the docs:
- The legacy XENMEM_claim_pages (only for global host-wide claims) - The new XEN_DOMCTL_claim_memory which adds NUMA-aware claims Also document the implementation of claims in the hypervisor. Signed-off-by: Bernhard Kaindl <[email protected]> --- .readthedocs.yaml | 13 +- docs/conf.py | 6 +- .../dom/DOMCTL_claim_memory-classes.mmd | 51 +++++++ .../dom/DOMCTL_claim_memory-seqdia.mmd | 23 ++++ .../dom/DOMCTL_claim_memory-workflow.mmd | 23 ++++ docs/guest-guide/dom/DOMCTL_claim_memory.rst | 125 ++++++++++++++++++ docs/guest-guide/dom/index.rst | 14 ++ docs/guest-guide/index.rst | 23 ++++ docs/guest-guide/mem/XENMEM_claim_pages.rst | 68 ++++++++++ docs/guest-guide/mem/index.rst | 12 ++ docs/hypervisor-guide/index.rst | 5 + docs/hypervisor-guide/mm/claims.rst | 114 ++++++++++++++++ docs/hypervisor-guide/mm/index.rst | 10 ++ 13 files changed, 485 insertions(+), 2 deletions(-) create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory-classes.mmd create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory-seqdia.mmd create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory-workflow.mmd create mode 100644 docs/guest-guide/dom/DOMCTL_claim_memory.rst create mode 100644 docs/guest-guide/dom/index.rst create mode 100644 docs/guest-guide/mem/XENMEM_claim_pages.rst create mode 100644 docs/guest-guide/mem/index.rst create mode 100644 docs/hypervisor-guide/mm/claims.rst create mode 100644 docs/hypervisor-guide/mm/index.rst diff --git a/.readthedocs.yaml b/.readthedocs.yaml index d3aff7662ebf..3be7334c7527 100644 --- a/.readthedocs.yaml +++ b/.readthedocs.yaml @@ -8,11 +8,22 @@ build: tools: python: "latest" + nodejs: "20" jobs: post_install: + # Required for rendering the mermaid diagrams in the offline + # documentation (PDF & ePub) formats. + - npm install -g @mermaid-js/mermaid-cli # Instead of needing a separate requirements.txt - - python -m pip install --upgrade --no-cache-dir sphinx-rtd-theme + - > + python -m pip install --upgrade --no-cache-dir sphinx-rtd-theme + sphinxcontrib-mermaid sphinx: configuration: docs/conf.py + +# Build PDF & ePub +formats: + - epub + - pdf \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index 2fb8bafe6589..9316202d3318 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -61,7 +61,11 @@ needs_sphinx = '1.4' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. -extensions = [] +extensions = ['sphinxcontrib.mermaid'] + +mermaid_init_js = """ +mermaid.initialize({ theme: 'Neo', startOnLoad: true }); +""" # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] diff --git a/docs/guest-guide/dom/DOMCTL_claim_memory-classes.mmd b/docs/guest-guide/dom/DOMCTL_claim_memory-classes.mmd new file mode 100644 index 000000000000..1406a4919442 --- /dev/null +++ b/docs/guest-guide/dom/DOMCTL_claim_memory-classes.mmd @@ -0,0 +1,51 @@ +%% SPDX-License-Identifier: CC-BY-4.0 +classDiagram + +class xen_domctl { + +uint32_t cmd + +uint32_t interface_version + +uint32_t domain + +xen_domctl_claim_memory +} + +class xen_domctl_claim_memory { + +memory_claim_t* claims + +uint32_t nr_claims + +uint32_t pad +} + +class memory_claim_t { + +uint64_aligned_t pages + +uint32_t node + +uint32_t pad +} + +class xc_domain_claim_memory["xc_domain_claim_memory()"] { + +xc_interface* xch + +uint32_t domid + +uint32_t nr_claims + +memory_claim_t* claims +} + +class page_alloc_globals["xen/common/page_alloc.c"] { + +unsigned long outstanding_claims + +unsigned long node_outstanding_claims[] +} + +class claim["DOMCTL_claim_memory"] { + +int claim_memory(d, uinfo) + +int domain_set_outstanding_pages(d, pages, node) +} + +class domain["struct domain"] { + +unsigned_int outstanding_pages + +nodeid_t claim_node +} + +xen_domctl_claim_memory o--> memory_claim_t +xen_domctl o--> xen_domctl_claim_memory +xc_domain_claim_memory ..> xen_domctl : populates +xc_domain_claim_memory ..> claim : calls via <tt>do_domctl()</tt> +claim ..> xen_domctl_claim_memory : reads +claim ..> domain : sets +claim ..> page_alloc_globals : updates outstanding claims diff --git a/docs/guest-guide/dom/DOMCTL_claim_memory-seqdia.mmd b/docs/guest-guide/dom/DOMCTL_claim_memory-seqdia.mmd new file mode 100644 index 000000000000..05d688c59f13 --- /dev/null +++ b/docs/guest-guide/dom/DOMCTL_claim_memory-seqdia.mmd @@ -0,0 +1,23 @@ +%% SPDX-License-Identifier: CC-BY-4.0 +sequenceDiagram + +actor DomainBuilder +participant OcamlStub as OCaml stub for<br>xc_domain<br>claim_memory +participant Libxc as xc_domain<br>claim_memory +participant Domctl as XEN_DOMCTL<br>claim_memory +#participant DomainLogic as claim_memory +participant Alloc as domain<br>set<br>outstanding_pages + +DomainBuilder->>OcamlStub: claims +OcamlStub->>OcamlStub: marshall claims -----> OCaml to C +OcamlStub->>Libxc: claims + +Libxc->>Domctl: do_domctl + +Domctl->>Domctl: copy_from_guest(claim) +Domctl->>Domctl: validate claim +Domctl->>Alloc: set<br>outstanding_pages +Alloc-->>Domctl: result +Domctl-->>Libxc: rc +Libxc-->>OcamlStub: rc +OcamlStub-->>DomainBuilder: claim_result \ No newline at end of file diff --git a/docs/guest-guide/dom/DOMCTL_claim_memory-workflow.mmd b/docs/guest-guide/dom/DOMCTL_claim_memory-workflow.mmd new file mode 100644 index 000000000000..372f2bb7a616 --- /dev/null +++ b/docs/guest-guide/dom/DOMCTL_claim_memory-workflow.mmd @@ -0,0 +1,23 @@ +%% SPDX-License-Identifier: CC-BY-4.0 +sequenceDiagram + +participant Toolstack +participant Xen +participant NUMA Node memory + +Toolstack->>Xen: XEN_DOMCTL_createdomain +Toolstack->>Xen: XEN_DOMCTL_max_mem(max_pages) + +Toolstack->>Xen: XEN_DOMCTL_claim_memory(pages, node) +Xen->>NUMA Node memory: Claim pages on node +Xen-->>Toolstack: Claim granted + +Toolstack->>Xen: XEN_DOMCTL_set_nodeaffinity(node) + +loop Populate domain memory + Toolstack->>Xen: XENMEM_populate_physmap(memflags:node) + Xen->>NUMA Node memory: alloc from claimed node +end + +Toolstack->>Xen: XEN_DOMCTL_claim_memory(0, NO_NODE) +Xen-->>Toolstack: Remaining claims released diff --git a/docs/guest-guide/dom/DOMCTL_claim_memory.rst b/docs/guest-guide/dom/DOMCTL_claim_memory.rst new file mode 100644 index 000000000000..8be37585f02a --- /dev/null +++ b/docs/guest-guide/dom/DOMCTL_claim_memory.rst @@ -0,0 +1,125 @@ +.. SPDX-License-Identifier: CC-BY-4.0 +.. _XEN_DOMCTL_claim_memory: + +XEN_DOMCTL_claim_memory +======================= + +This **domctl** command allows a privileged domain to stake a memory claim for +a domain identical to :ref:`XENMEM_claim_pages`, but with support for +NUMA-aware memory claims. + +A claim entry with a node value of ``XEN_DOMCTL_CLAIM_MEMORY_NO_NODE`` stakes +a claim for host memory, exactly like :ref:`XENMEM_claim_pages` does. + +NUMA-aware memory claims +------------------------ + +Memory locality is an important factor for performance in NUMA systems. +Allocating memory close to the CPU that will use it can reduce latency +and improve overall performance. + +By claiming memory on specific NUMA nodes, toolstacks can ensure that they +will be able to allocate memory for the domain on those nodes. This is +particularly beneficial for workloads that are sensitive to memory latency, +such as in-memory databases. + +**Note:** The ABI supports multiple claims for future expansion. At the moment, +Xen accepts a single claim entry (either a NUMA-aware or host-wide claim). + +Implementation notes +-------------------- + +As described in :ref:`XENMEM_claim_pages`, Xen keeps track of the number +of claimed pages in the domain's ``d->outstanding_pages`` counter. + +Xen declares a NUMA-aware claim by assigning ``d->claim_node`` to a NUMA node, +which declares that ``d->outstanding_pages`` is claimed on ``d->claim_node``. + +See :ref:`hypervisor-guide` > :ref:`memory_management` > :ref:`memory_claims` +for more details on the implementation on the details how claims are handled +by the buddy allocator, and how a toolstack can populate the memory of a domain +from the claimed node, even if it needs to wait for scrubbing to complete. + +Used functions & data structures +-------------------------------- + +This diagram illustrates the key functions and data structures involved in the +implementation of the ``domctl`` hypercall command ``XEN_DOMCTL_claim_memory``: + +.. mermaid:: DOMCTL_claim_memory-classes.mmd + :caption: Diagram: Function and data relationships of XEN_DOMCTL_claim_memory + +Call sequence diagram +--------------------- + +The following sequence diagram illustrates the call flow for claiming memory +for a domain using this hypercall command from an OCaml toolstack: + +.. mermaid:: DOMCTL_claim_memory-seqdia.mmd + :caption: Sequence diagram: Call flow for claiming memory for a domain + +Claim workflow +-------------- + +The following diagram illustrates a workflow for claiming and populating memory: + +.. mermaid:: DOMCTL_claim_memory-workflow.mmd + :caption: Workflow diagram: Claiming and populating memory for a domain + +API example (libxc) +------------------- +The following example demonstrates how a toolstack can claim memory before +building the domain and then releasing the claim once the memory population +is complete. + +Note: ``memory_claim_t`` contains padding to allow for future expansion. +Thus, the structure must be zero-initialised to ensure forward compatibility. +This can be achieved by using the ``XEN_NODE_CLAIM_INIT`` macro, which sets the +pages and node fields while zero-initialising the padding of the structure, +zero-initialising the entire structure, or by using a compound literal with +designated initialisers to set the pages and node fields while zero-initialising +the padding of the structure. + +.. code-block:: C + + #include <xenctrl.h> + + int claim_guest_memory(xc_interface *xch, uint32_t domid, + uint64_t pages) + { + memory_claim_t claim[] = { + /* + * Example 1: + * Uses the ``XEN_NODE_CLAIM_INIT`` macro to zero-initialise the padding + * and set the pages and node fields for a NUMA-aware claim on node 0. + */ + XEN_NODE_CLAIM_INIT(pages, 0) /* Claim memory on NUMA node 0 */ + }; + + /* Claim memory from NUMA node 0 for the domain build. */ + return xc_domain_claim_memory(xch, domid, 1, claim); + } + + int release_claim(xc_interface *xch, uint32_t domid) + { + memory_claim_t claim[] = { + /* + * Example 2: + * Uses a compound literal with designated initialisers to set the + * fields to release the claim while zero-initialising the rest + * of the structure for forward compatibility. + */ + (memory_claim_t){ + /* + * pages == 0 releases any outstanding claim. + * The node field is not used in this case, but must be set to + * XEN_DOMCTL_CLAIM_MEMORY_NO_NODE for forward compatibility. + */ + .pages = 0, + .node = XEN_DOMCTL_CLAIM_MEMORY_NO_NODE, + } + }; + + /* Release any remaining claim once population is done. */ + return xc_domain_claim_memory(xch, domid, 1, claim); + } diff --git a/docs/guest-guide/dom/index.rst b/docs/guest-guide/dom/index.rst new file mode 100644 index 000000000000..445ccf599047 --- /dev/null +++ b/docs/guest-guide/dom/index.rst @@ -0,0 +1,14 @@ +.. SPDX-License-Identifier: CC-BY-4.0 + +Domctl Hypercall +================ + +Through domctl hypercalls, toolstacks in privileged domains can perform +operations related to domain management. This includes operations such as +creating, destroying, and modifying domains, as well as querying domain +information. + +.. toctree:: + :maxdepth: 2 + + DOMCTL_claim_memory diff --git a/docs/guest-guide/index.rst b/docs/guest-guide/index.rst index 5455c67479cf..d9611cd7504d 100644 --- a/docs/guest-guide/index.rst +++ b/docs/guest-guide/index.rst @@ -3,6 +3,29 @@ Guest documentation =================== +Xen exposes a set of hypercalls that allow domains and toolstacks in +privileged contexts (such as Dom0) to request services from the hypervisor. + +Through these hypercalls, privileged domains can perform privileged operations +such as querying system information, memory and domain management, +and enabling inter-domain communication via shared memory and event channels. + +These hypercalls are documented in the following sections, grouped by their +functionality. Each section provides an overview of the hypercalls, their +parameters, and examples of how to use them. + +Hypercall API documentation +--------------------------- + +.. toctree:: + :maxdepth: 2 + + dom/index + mem/index + +Hypercall ABI documentation +--------------------------- + .. toctree:: :maxdepth: 2 diff --git a/docs/guest-guide/mem/XENMEM_claim_pages.rst b/docs/guest-guide/mem/XENMEM_claim_pages.rst new file mode 100644 index 000000000000..7d465d2a87fe --- /dev/null +++ b/docs/guest-guide/mem/XENMEM_claim_pages.rst @@ -0,0 +1,68 @@ +.. SPDX-License-Identifier: CC-BY-4.0 +.. _XENMEM_claim_pages: + +XENMEM_claim_pages +================== + +This **xenmem** command allows a privileged guest to stake a memory claim for a +domain, identical to :ref:`XEN_DOMCTL_claim_memory`, but without support for +NUMA-aware memory claims. + +Memory claims in Xen +-------------------- + +The Xen hypervisor maintains a counter of outstanding pages for each domain +which maintains a number of pages claimed, but not allocated for that domain. + +If the outstanding pages counter is zero, this hypercall allows a privileged +guest to stake a claim for a specified number of pages of system memory for the +domain. + +If the claim is successful, Xen updates the domain's outstanding pages counter +to reflect the new claim, Xen allocates from the pool of claimed memory only +for allocations for domains with a claim for this memory. + +A domain builder (toolstack in a privileged domain) building the domain can then +allocate the guest memory for the domain, which converts the outstanding claim +into actual memory of the new domain, backed by physical pages. + +Note that the resulting claim is relative to the already allocated pages for the +domain, so the **pages** argument of this hypercall is absolute and must +correspond to the total number expected to be allocated for the domain, +and not incremental to the already allocated pages. + +Memory allocations by Xen for the domain also consume the claim, so toolstacks +should stake a claim that is larger than the guest memory requirement to +account for Xen's own memory usage. The exact amount of extra memory required +depends on the configuration and features used by the domain, the host +architecture and the features enabled by the Xen hypervisor on the host. + +Life-cycle of a claim +--------------------- + +The Domain's maximum memory limit must be set prior to staking a claim as +the sum of the already allocated pages and the claim must be within that limit. + +To release the claim after the domain build is complete, call this hypercall +command with the pages argument set to zero. This releases any remaining claim. +`libxenguest` does this after the guest memory has been allocated for the domain +and Xen does this also when it kills the domain. + +API example (libxc) +------------------- +The following example demonstrates how a toolstack can claim memory before +building the domain and then releasing the claim once the memory population +is complete. + +.. code-block:: C + + #include <xenctrl.h> + ... + /* Claim memory for the domain build. */ + int ret = xc_domain_claim_pages(xch, domid, nr_pages); + + /* Build the domain and allocate memory for it. */ + ... + + /* Release any remaining claim after populating the domain memory. */ + int ret = xc_domain_claim_pages(xch, domid, 0); diff --git a/docs/guest-guide/mem/index.rst b/docs/guest-guide/mem/index.rst new file mode 100644 index 000000000000..dabd1fd0153e --- /dev/null +++ b/docs/guest-guide/mem/index.rst @@ -0,0 +1,12 @@ +.. SPDX-License-Identifier: CC-BY-4.0 + +Memctl Hypercall +================ + +The memctl hypercall interface allows guests to perform various control +operations related to memory management. + +.. toctree:: + :maxdepth: 2 + + XENMEM_claim_pages diff --git a/docs/hypervisor-guide/index.rst b/docs/hypervisor-guide/index.rst index 520fe01554ab..fef35a1ac4fe 100644 --- a/docs/hypervisor-guide/index.rst +++ b/docs/hypervisor-guide/index.rst @@ -1,12 +1,17 @@ .. SPDX-License-Identifier: CC-BY-4.0 +.. _hypervisor-guide: Hypervisor documentation ======================== +See :ref:`memory_claims` for more details on the implementation of the claims +mechanism in the Hypervisor and its interaction with the buddy allocator. + .. toctree:: :maxdepth: 2 code-coverage + mm/index x86/index arm/index \ No newline at end of file diff --git a/docs/hypervisor-guide/mm/claims.rst b/docs/hypervisor-guide/mm/claims.rst new file mode 100644 index 000000000000..97eb8a68fb1e --- /dev/null +++ b/docs/hypervisor-guide/mm/claims.rst @@ -0,0 +1,114 @@ +.. SPDX-License-Identifier: CC-BY-4.0 +.. _memory_claims: + +Memory Claims +============= + +Overview +-------- + +Xen's page allocator supports a **claims** mechanism that allows a domain +builder to reserve memory before allocation begins, preventing concurrent +allocations from exhausting available pages mid-build. +A claim can be global (host-wide) or target a specific NUMA node, ensuring +that a domain's memory is allocated locally on the same node as its vCPUs. + +The host-wide claims check subtracts global claims from total available pages. +If the domain has claims, its ``d->outstanding_pages`` are added back as +available (simplified pseudo-code): + +.. code:: C + + ASSERT(spin_is_locked(&heap_lock)); + unsigned long global_avail = total_avail_pages - outstanding_claims + + d->outstanding_pages; + return alloc_request <= global_avail; + +Similarly, the per-node check enforces node-level claims by subtracting +outstanding node claims from available node pages, and adding back the domain's +claim if allocating from the claimed node: + +.. code:: C + + ASSERT(spin_is_locked(&heap_lock)); + unsigned long avail = node_avail_pages(node) + - node_outstanding_claims(node) + + (node == d->claim_node ? d->outstanding_pages : 0); + return alloc_request <= avail; + +Simplified pseudo-code for the claims checks in the buddy allocator: + +.. code:: C + + struct page_info *get_free_buddy(order, memflags, d) { + for ( ; ; ) { + node = preferred_node_or_next_node(); + if (!node_allocatable_request(d, memflags, 1 << order, node)) + goto try_next_node; + /* Find a zone on this node with a suitable buddy */ + for (zone = highest_zone; zone >= lowest_zone; zone--) + for (j = order; j <= MAX_ORDER; j++) + if (pg = remove_head(&heap(node, zone, j))) + return pg; + try_next_node: + if (req_node != NUMA_NO_NODE && memflags & MEMF_exact_node) + return NULL; + /* Fall back to the next node and repeat. */ + } + } + + struct page_info *alloc_heap_pages(d, order, memflags) { + if (!host_allocatable_request(d, memflags, 1 << order)) + return NULL; + pg = get_free_buddy(order, memflags, d); + if (!pg) /* Retry allowing unscrubbed pages */ + pg = get_free_buddy(order, memflags|MEMF_no_scrub, d); + if (!pg) + return NULL; + if (pg has dirty pages) + scrub_dirty_pages(pg); + return pg; + } + +.. note:: The first ``get_free_buddy()`` pass skips unscrubbed pages and may + fall back to other nodes. With ``memflags & MEMF_exact_node``, no fallback + occurs, so the first pass may return ``NULL``. + The 2nd pass with ``MEMF_no_scrub`` will consider the unscrubbed pages. + ``alloc_heap_pages()`` then scrubs them before returning, guaranteeing the + domain gets the desired node-local pages even when scrubbing is pending. + + Therefore, toolstacks should set ``MEMF_exact_node`` in ``memflags`` when + allocating for a domain with a NUMA-aware claim to with + ``XENMEMF_exact_node(node)``. + + For efficient scrubbing, toolstacks might want to run domain builds + pinned on a CPU of the target NUMA node to scrub the pages on that node + without cross-node traffic and lower latency to speed up domain build. + +Data Structures +--------------- + +The following diagram shows the relationships between global, per-node, +and per-domain claim counters, all protected by the global ``heap_lock``. + +.. mermaid:: + + graph TB + subgraph "Protected by the heap_lock" + direction TB + Global --Sum of--> Per-node + Per-node --Sum of--> Per-domain + end + subgraph Per-domain + direction LR + claim_node["d->claim_node"] + claim_node --claims on--> outstanding_pages["d->outstanding_pages"] + end + subgraph Per-node + direction LR + node_outstanding_claims--constrains-->node_avail_pages + end + subgraph Global + direction LR + outstanding_claims--constrains-->total_avail_pages + end diff --git a/docs/hypervisor-guide/mm/index.rst b/docs/hypervisor-guide/mm/index.rst new file mode 100644 index 000000000000..9b5d60e3181a --- /dev/null +++ b/docs/hypervisor-guide/mm/index.rst @@ -0,0 +1,10 @@ +.. SPDX-License-Identifier: CC-BY-4.0 +.. _memory_management: + +Memory Management +================= + +.. toctree:: + :maxdepth: 2 + + claims -- 2.39.5
