On 01.04.2025 19:24, Michal Wajdeczko wrote:
please use "drm/xe/vf:" in the subject as this patch is still more VF
oriented, then general SRIOV
ok.
On 31.03.2025 15:21, Tomasz Lis wrote:
We have only one GGTT for all IOV functions, with each VF having assigned
a range of addresses for its use. After migration, a VF can receive a
different range of addresses than it had initially.

This implements shifting GGTT addresses within drm_mm nodes, so that
VMAs stay valid after migration. This will make the driver use new
addresses when accessing GGTT from the moment the shifting ends.

By taking the ggtt->lock for the period of VMA fixups, this change
also adds constraint on that mutex. Any locks used during the recovery
cannot ever wait for hardware response - because after migration,
the hardware will not do anything until fixups are finished.

v2: Moved some functs to xe_ggtt.c; moved shift computation to just
   after querying; improved documentation; switched some warns to asserts;
   skipping fixups when GGTT shift eq 0; iterating through tiles (Michal)
v3: Updated kerneldocs, removed unused funct, properly allocate
   balloning nodes if non existent
v4: Re-used ballooning functions from VF init, used bool in place of
   standard error codes
v5: Renamed one function

Signed-off-by: Tomasz Lis<[email protected]>
---
  drivers/gpu/drm/xe/xe_ggtt.c              | 33 +++++++++
  drivers/gpu/drm/xe/xe_ggtt.h              |  1 +
  drivers/gpu/drm/xe/xe_gt_sriov_vf.c       | 83 +++++++++++++++++++++++
  drivers/gpu/drm/xe/xe_gt_sriov_vf.h       |  1 +
  drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h |  2 +
  drivers/gpu/drm/xe/xe_sriov_vf.c          | 17 +++++
  6 files changed, 137 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 2d7456e37ef4..b13c4a12393e 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -482,6 +482,39 @@ void xe_ggtt_node_remove_balloon(struct xe_ggtt_node *node)
        drm_mm_remove_node(&node->base);
  }
+/**
+ * xe_ggtt_shift_mm_nodes - Shift GGTT nodes to adjust for a change in usable 
address range.
drop "mm" from the function name as it is implementation detail

        xe_ggtt_shift_nodes()

The fact that nodes are used for Memory Management is an implementation detail?

I will remove `mm` from the name, because no other function uses that shortcut. But let's stick

to real arguments, ie. that no other function dealing with nodes has the _mm_ in name.

+ * @ggtt: the &xe_ggtt struct instance
+ * @shift: change to the location of area provisioned for current VF
this function is quite generic and while it is used by the VF only, the
parameter doesn't have to be described as such
I just changed the subject line because (from comment above) the patch is VF oriented.
you may add here, in the 'longer description' part of the kernel-doc,
after explaining how it works, that function is mostly used by the VF,
and why we believe it will succeed

Not "mostly" but "only". There is no other scenario where we could anticipate this function to be used.

While universalization is generally a good direction, pretending that this code will ever be used for anything else than VF is not a direction with any chance for future benefits.

For the comment about errors - will add.

+ */
+void xe_ggtt_shift_mm_nodes(struct xe_ggtt *ggtt, s64 shift)
+{
+       struct drm_mm_node *node, *tmpn;
+       LIST_HEAD(temp_list_head);
+       int err;
+
+       lockdep_assert_held(&ggtt->lock);
+
+       /*
+        * Move nodes, from range previously assigned to this VF, to a temp 
list.
nit: no need for multi-line comment style

also, maybe it could be moved to the 'longer description' part of the
function kernel-doc, after sanitize it from the 'VF' specific wording

+        */
+       drm_mm_for_each_node_safe(node, tmpn, &ggtt->mm) {
+               drm_mm_remove_node(node);
+               list_add(&node->node_list, &temp_list_head);
+       }
+
+       /*
+        * Now the GGTT VM contains no nodes. We can re-add all VF nodes with
+        * shifted offsets.
+        */
also consider to move this comment to function kernel-doc

Not sure why we want a detail on implementation in the function description.

Also not sure why we want detailed description block in this function, when we already have that in the caller.

But I don't mind, will move.

+       list_for_each_entry_safe(node, tmpn, &temp_list_head, node_list) {
+               list_del(&node->node_list);
+               node->start += shift;
+               err = drm_mm_reserve_node(&ggtt->mm, node);
+               xe_tile_assert(ggtt->tile, !err);
while we believe it should be always possible to 'shift' all nodes, as
we just released our balloons, I'm not sure that this assert here alone
is sufficient

maybe before starting any movements, check that 'shift' is valid, and
add asserts for each node that shifted location is within GGTT space?

Before starting? How exactly do we check the shift before we have any nodes? Without nodes, we have nothing to compare to.

I assume that what you're going for, is just to add more asserts on each node. Will do.

+       }
+}
+
  /**
   * xe_ggtt_node_insert_locked - Locked version to insert a &xe_ggtt_node into 
the GGTT
   * @node: the &xe_ggtt_node to be inserted
diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
index 27e7d67de004..a07194cd3724 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.h
+++ b/drivers/gpu/drm/xe/xe_ggtt.h
@@ -18,6 +18,7 @@ void xe_ggtt_node_fini(struct xe_ggtt_node *node);
  int xe_ggtt_node_insert_balloon(struct xe_ggtt_node *node,
                                u64 start, u64 size);
  void xe_ggtt_node_remove_balloon(struct xe_ggtt_node *node);
+void xe_ggtt_shift_mm_nodes(struct xe_ggtt *ggtt, s64 shift);
int xe_ggtt_node_insert(struct xe_ggtt_node *node, u32 size, u32 align);
  int xe_ggtt_node_insert_locked(struct xe_ggtt_node *node,
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c 
b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
index 9edbe34f45f4..e9e7ddeb4254 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
@@ -415,6 +415,7 @@ static int vf_get_ggtt_info(struct xe_gt *gt)
        xe_gt_sriov_dbg_verbose(gt, "GGTT %#llx-%#llx = %lluK\n",
                                start, start + size - 1, size / SZ_1K);
+ config->ggtt_shift = start - (s64)config->ggtt_base;
on the first probe, shouldn't we store it as '0' ?

`ggtt_shift` is a part of`xe_gt`, allocated in `xe_gt_alloc()` using `kzalloc` variant. It is always zero on probe, regardless if first or not. But even regardless of that, the value of this parameter is valid only when within the post-migration recovery worker. It is not accessed outside, and noone cares about its value outside.

        config->ggtt_base = start;
        config->ggtt_size = size;
@@ -972,6 +973,88 @@ int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt)
        return err;
  }
hmm, are you sure that place between

        xe_gt_sriov_vf_query_runtime
and
        vf_runtime_reg_cmp

is the best place for this function?

maybe at least place it closer to

        xe_gt_sriov_vf_migrated_event_handler

that is at least partially related,
but not sure if also not miss-located
no problem, will place it there.
+/**
+ * xe_gt_sriov_vf_fixup_ggtt_nodes - Shift GGTT allocations to match assigned 
range.
+ * @gt: the &xe_gt struct instance
+ * Return: True if fixups are necessary
+ *
+ * Since Global GTT is not virtualized, each VF has an assigned range
+ * within the global space. This range might have changed during migration,
+ * which requires all memory addresses pointing to GGTT to be shifted.
This 'longer description' shall be _after_ 'Return' description

seehttps://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
I see from the link than you meant `Return:` at the end. But I agreed to your comments below so the return vanishes completely.
+ */
+bool xe_gt_sriov_vf_fixup_ggtt_nodes(struct xe_gt *gt)
+{
+       struct xe_gt_sriov_vf_selfconfig *config = &gt->sriov.vf.self_config;
you should introduce xe_gt_sriov_vf_ggtt_shift() earlier than patch 4/4
and use it here (or better let the caller use it and pass ggtt_shift as
parameter)
will move that function earlier.
+       struct xe_tile *tile = gt_to_tile(gt);
+       struct xe_ggtt *ggtt = tile->mem.ggtt;
+       s64 ggtt_shift;
+
assert that gt is not a media one
ok
+       mutex_lock(&ggtt->lock);
+       ggtt_shift = config->ggtt_shift;
+       /*
+        * Move nodes, including balloons, from range previously assigned to 
this VF,
+        * into newly provisioned area.
+        *
+        * The balloon nodes are there to eliminate unavailable ranges from 
use: one
+        * reserves the GGTT area below the range for current VF, and another 
one
+        * reserves area above.
+        *
+        * Below is a GGTT layout of example VF, with a certain address range 
assigned to
+        * said VF, and inaccessible areas above and below:
+        *
+        *  0                                                                   
     4GiB
+        *  |<--------------------------- Total GGTT size 
----------------------------->|
+        *      WOPCM                                                         
GUC_TOP
+        *      |<-------------- Area mappable by xe_ggtt instance 
---------------->|
+        *
+        *  
+---+---------------------------------+----------+----------------------+---+
+        *  |\\\|/////////////////////////////////|  VF mem  
|//////////////////////|\\\|
+        *  
+---+---------------------------------+----------+----------------------+---+
+        *
+        * Hardware enforced access rules before migration:
+        *
+        *  |<------- inaccessible for VF ------->|<VF owned>|<-- inaccessible for 
VF ->|
+        *
+        * drm_mm nodes used for tracking allocations:
use of drm_mm is implementation detail of the xe_ggtt and it is not
relevant here, just say 'GGTT nodes'

We are trying to hide implementation details. Not because these details can vary/change - this will never happen.

We're just calling some things implementation details and hiding them because it is doable.

We've talked about this before, so will disagree but commit.

+        *
+        *     |<----------- balloon ------------>|<- nodes->|<----- balloon 
------>|
+        *
+        * After the migration, GGTT area assigned to the VF might have 
shifted, either
+        * to lower or to higher address. But we expect the total size and 
extra areas to
+        * be identical, as migration can only happen between matching 
platforms.
+        * Below is an example of GGTT layout of the VF after migration. 
Content of the
+        * GGTT for VF has been moved to a new area, and we receive its address 
from GuC:
+        *
+        *  
+---+----------------------+----------+---------------------------------+---+
+        *  |\\\|//////////////////////|  VF mem  
|/////////////////////////////////|\\\|
+        *  
+---+----------------------+----------+---------------------------------+---+
+        *
+        * Hardware enforced access rules after migration:
+        *
+        *  |<- inaccessible for VF -->|<VF owned>|<------- inaccessible for VF 
------->|
+        *
+        * So the VF has a new slice of GGTT assigned, and during migration 
process, the
+        * memory content was copied to that new area. But the drm_mm nodes 
within xe kmd
+        * are still tracking allocations using the old addresses. The nodes 
within VF
+        * owned area have to be shifted, and balloon nodes need to be resized 
to
+        * properly mask out areas not owned by the VF.
+        *
+        * Fixed drm_mm nodes used for tracking allocations:
+        *
+        *     |<------ balloon ------>|<- nodes->|<----------- balloon 
----------->|
+        *
+        * Due to use of GPU profiles, we do not expect the old and new GGTT 
ares to
+        * overlap; but our node shifting will fix addresses properly 
regardless.
+        */
this inline comment is now much bigger than actual implementation

why not promote it to the full DOC: as then it could be included in the
master SRIOV documentation

will do.

..but for this case only. I will oppose to any tries of having the architecture docs rewritten as code comments.

+       if (ggtt_shift) {
+               xe_gt_sriov_vf_deballoon_ggtt(gt);
+               xe_ggtt_shift_mm_nodes(ggtt, ggtt_shift);
+               xe_gt_sriov_vf_balloon_ggtt(gt);
+       }
+       mutex_unlock(&ggtt->lock);
+       return ggtt_shift != 0;
+}
+
  static int vf_runtime_reg_cmp(const void *a, const void *b)
  {
        const struct vf_runtime_reg *ra = a;
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h 
b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
index c87b0e9c7ebc..13c04e313aa6 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
@@ -20,6 +20,7 @@ int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
  int xe_gt_sriov_vf_prepare_ggtt(struct xe_gt *gt);
  int xe_gt_sriov_vf_balloon_ggtt(struct xe_gt *gt);
  void xe_gt_sriov_vf_deballoon_ggtt(struct xe_gt *gt);
+bool xe_gt_sriov_vf_fixup_ggtt_nodes(struct xe_gt *gt);
  int xe_gt_sriov_vf_notify_resfix_done(struct xe_gt *gt);
  void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
index a57f13b5afcd..5ccbdf8d08b6 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
@@ -40,6 +40,8 @@ struct xe_gt_sriov_vf_selfconfig {
        u64 ggtt_base;
        /** @ggtt_size: assigned size of the GGTT region. */
        u64 ggtt_size;
+       /** @ggtt_shift: difference in ggtt_base on last migration */
+       s64 ggtt_shift;
        /** @lmem_size: assigned size of the LMEM. */
        u64 lmem_size;
        /** @num_ctxs: assigned number of GuC submission context IDs. */
diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
index c1275e64aa9c..2eb6b8d8a217 100644
--- a/drivers/gpu/drm/xe/xe_sriov_vf.c
+++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
@@ -7,6 +7,7 @@
#include "xe_assert.h"
  #include "xe_device.h"
+#include "xe_gt.h"
  #include "xe_gt_sriov_printk.h"
  #include "xe_gt_sriov_vf.h"
  #include "xe_pm.h"
@@ -170,6 +171,20 @@ static bool vf_post_migration_imminent(struct xe_device 
*xe)
        work_pending(&xe->sriov.vf.migration.worker);
  }
+static bool vf_post_migration_fixup_ggtt_nodes(struct xe_device *xe)
+{
+       struct xe_tile *tile;
+       unsigned int id;
+       bool need_fixups = false;
try to define vars in reverse-xmas-tree order

        bool need_fixups = false;
        struct xe_tile *tile;
        unsigned int id;
ok
+
+       for_each_tile(tile, xe, id) {
+               struct xe_gt *gt = tile->primary_gt;
                shift = xe_gt_sriov_vf_ggtt_shift(gt);
                if (shift) {
                        need_fixups = true;
                        xe_gt_sriov_vf_fixup_ggtt_nodes(gt, shift);
                }

So in this specific place we want to make the function look longer and reveal a detail.

Makes no sense to me, but since you don't seem convinced by previous review round - will do this.

-Tomasz

+
+               need_fixups |= xe_gt_sriov_vf_fixup_ggtt_nodes(gt);
+       }
+       return need_fixups;
+}
+
  /*
   * Notify all GuCs about resource fixups apply finished.
   */
@@ -191,6 +206,7 @@ static void vf_post_migration_notify_resfix_done(struct 
xe_device *xe)
static void vf_post_migration_recovery(struct xe_device *xe)
  {
+       bool need_fixups;
        int err;
drm_dbg(&xe->drm, "migration recovery in progress\n");
@@ -201,6 +217,7 @@ static void vf_post_migration_recovery(struct xe_device *xe)
        if (unlikely(err))
                goto fail;
+ need_fixups = vf_post_migration_fixup_ggtt_nodes(xe);
        /* FIXME: add the recovery steps */
        vf_post_migration_notify_resfix_done(xe);
        xe_pm_runtime_put(xe);

Reply via email to