xe_vm: Add error injection support to lock and prep

Matthew Brost Tue, 26 Aug 2025 11:58:16 -0700

On Tue, Aug 26, 2025 at 12:34:27PM -0600, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Brost, Matthew <[email protected]> 
> Sent: Tuesday, August 26, 2025 11:04 AM
> To: Cavitt, Jonathan <[email protected]>
> Cc: [email protected]; Gupta, saurabhg 
> <[email protected]>; Zuo, Alex <[email protected]>; Harrison, John C 
> <[email protected]>
> Subject: Re: [PATCH 2/2] drm/xe/xe_vm: Add error injection support to lock 
> and prep
> > 
> > On Tue, Aug 26, 2025 at 03:43:55PM +0000, Jonathan Cavitt wrote:
> > > Add error injection support to the function
> > > vm_bind_ioctl_ops_lock_and_prep.  This necessitates marking the function
> > > as noinline.
> > > 
> > > Signed-off-by: Jonathan Cavitt <[email protected]>
> > > Cc: Matthew Brost <[email protected]>
> > > Cc: John Harrison <[email protected]>
> > > ---
> > >  drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > index 1a8841116e40..e527c90b8c33 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -3201,9 +3201,10 @@ static int 
> > > vm_bind_ioctl_ops_prefetch_ranges(struct xe_vm *vm, struct xe_vma_ops
> > >   return 0;
> > >  }
> > >  
> > > -static int vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec,
> > > -                                    struct xe_vm *vm,
> > > -                                    struct xe_vma_ops *vops)
> > > +static noinline int
> > 
> > Ideally in [1] we'd have something like this:
> > 
> > #ifdef CONFIG_FUNCTION_ERROR_INJECTION
> > #define error_injectable noinline
> > #else
> > #define error_injectable
> > #endif
> > 
> > That might take sometime to get through, but in the meantine can we
> > define something on the Xe side for this?
> > 
> > [1] 
> > https://elixir.bootlin.com/linux/v6.16.3/source/include/asm-generic/error-injection.h
> 
> In the short term, I think something like what's done with the 
> xe_is_injection_active function
> in xe_guc_ct.c would work.  Let me try applying that.
>


I would just add 'error_injectable' define like I suggest to common xe
header file.

We might be able to drop xe_is_injection_active too but perhaps there is
reason for injecting the error after setting the CT to broken. If there
is a reason for this, then keeping xe_is_injection_active makes sense.

> > 
> > > +vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec,
> > > +                         struct xe_vm *vm,
> > > +                         struct xe_vma_ops *vops)
> > >  {
> > >   struct xe_vma_op *op;
> > >   int err;
> > > @@ -3220,6 +3221,7 @@ static int vm_bind_ioctl_ops_lock_and_prep(struct 
> > > drm_exec *exec,
> > >  
> > >   return 0;
> > >  }
> > > +ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_lock_and_prep, ERRNO);
> > >  
> > 
> > We absolutely need the same injection points which are removed in patch
> > #1 + an IGT with the same coverage. Please add similar points. VM binds
> 
> They're already present.
> 

No, see below.

> > are a deep software pipeline that can be unwound at any point of
> > failure. Different injection points trigger various unwind flows that
> > need to be tested. It took me a long time to get this right, is easy to
> > break, so we need good testing in place.
> 
> AFAICT, the test xe_vm@bind-array-conflict-error-inject test only exercises
> the vm_bind_ioctl_ops_lock_and_prep function.  I can add additional points
> to test in the test file if that's what is desired.

grep for 'TEST_VM_OPS_ERROR', those are the injection points. Internally
these points are iterated over on each injection. i.e.,
bind-array-conflict-error-inject runs a loop over each engine instance,
we have 4 injection points, we more than 4 engine instances, so all 4
points will be triggered by bind-array-conflict-error-inject. We need
similar coverage to this in the IGT (e.g., build a table of injection
points, pick the next injection pointer each iteration.

Matt 

> -Jonathan Cavitt
> 
> > 
> > Matt
> > 
> > >  static void op_trace(struct xe_vma_op *op)
> > >  {
> > > -- 
> > > 2.43.0
> > > 
> >

Re: [PATCH 2/2] drm/xe/xe_vm: Add error injection support to lock and prep

Reply via email to