On Tue, Aug 26, 2025 at 12:34:27PM -0600, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Brost, Matthew <[email protected]>
> Sent: Tuesday, August 26, 2025 11:04 AM
> To: Cavitt, Jonathan <[email protected]>
> Cc: [email protected]; Gupta, saurabhg
> <[email protected]>; Zuo, Alex <[email protected]>; Harrison, John C
> <[email protected]>
> Subject: Re: [PATCH 2/2] drm/xe/xe_vm: Add error injection support to lock
> and prep
> >
> > On Tue, Aug 26, 2025 at 03:43:55PM +0000, Jonathan Cavitt wrote:
> > > Add error injection support to the function
> > > vm_bind_ioctl_ops_lock_and_prep. This necessitates marking the function
> > > as noinline.
> > >
> > > Signed-off-by: Jonathan Cavitt <[email protected]>
> > > Cc: Matthew Brost <[email protected]>
> > > Cc: John Harrison <[email protected]>
> > > ---
> > > drivers/gpu/drm/xe/xe_vm.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > index 1a8841116e40..e527c90b8c33 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -3201,9 +3201,10 @@ static int
> > > vm_bind_ioctl_ops_prefetch_ranges(struct xe_vm *vm, struct xe_vma_ops
> > > return 0;
> > > }
> > >
> > > -static int vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec,
> > > - struct xe_vm *vm,
> > > - struct xe_vma_ops *vops)
> > > +static noinline int
> >
> > Ideally in [1] we'd have something like this:
> >
> > #ifdef CONFIG_FUNCTION_ERROR_INJECTION
> > #define error_injectable noinline
> > #else
> > #define error_injectable
> > #endif
> >
> > That might take sometime to get through, but in the meantine can we
> > define something on the Xe side for this?
> >
> > [1]
> > https://elixir.bootlin.com/linux/v6.16.3/source/include/asm-generic/error-injection.h
>
> In the short term, I think something like what's done with the
> xe_is_injection_active function
> in xe_guc_ct.c would work. Let me try applying that.
>
I would just add 'error_injectable' define like I suggest to common xe
header file.
We might be able to drop xe_is_injection_active too but perhaps there is
reason for injecting the error after setting the CT to broken. If there
is a reason for this, then keeping xe_is_injection_active makes sense.
> >
> > > +vm_bind_ioctl_ops_lock_and_prep(struct drm_exec *exec,
> > > + struct xe_vm *vm,
> > > + struct xe_vma_ops *vops)
> > > {
> > > struct xe_vma_op *op;
> > > int err;
> > > @@ -3220,6 +3221,7 @@ static int vm_bind_ioctl_ops_lock_and_prep(struct
> > > drm_exec *exec,
> > >
> > > return 0;
> > > }
> > > +ALLOW_ERROR_INJECTION(vm_bind_ioctl_ops_lock_and_prep, ERRNO);
> > >
> >
> > We absolutely need the same injection points which are removed in patch
> > #1 + an IGT with the same coverage. Please add similar points. VM binds
>
> They're already present.
>
No, see below.
> > are a deep software pipeline that can be unwound at any point of
> > failure. Different injection points trigger various unwind flows that
> > need to be tested. It took me a long time to get this right, is easy to
> > break, so we need good testing in place.
>
> AFAICT, the test xe_vm@bind-array-conflict-error-inject test only exercises
> the vm_bind_ioctl_ops_lock_and_prep function. I can add additional points
> to test in the test file if that's what is desired.
grep for 'TEST_VM_OPS_ERROR', those are the injection points. Internally
these points are iterated over on each injection. i.e.,
bind-array-conflict-error-inject runs a loop over each engine instance,
we have 4 injection points, we more than 4 engine instances, so all 4
points will be triggered by bind-array-conflict-error-inject. We need
similar coverage to this in the IGT (e.g., build a table of injection
points, pick the next injection pointer each iteration.
Matt
> -Jonathan Cavitt
>
> >
> > Matt
> >
> > > static void op_trace(struct xe_vma_op *op)
> > > {
> > > --
> > > 2.43.0
> > >
> >