Quoting Souza, Jose (2026-06-22 16:43:11)
> On Mon, 2026-06-22 at 13:30 +0300, Joonas Lahtinen wrote:
> > (+ Sima and dri-devel)
> >
> > This goes very much against the previously discussed and agreed
> > direction[1].
> >
> > The QUERY style IOCTL (like xe_vm_get_property_ioctl here) were
> > specifically
> > designed to have built-in feature-checking to avoid need for anything
> > like this check.
> >
> > Executing the query with size zero would be a pretty much no-overhead
> > call (there's one spinlock there) as we can see from the body of
> > xe_vm_get_property_helper:
> >
> > if (!args->size) {
> > args->size = size;
> > return 0;
> > }
> >
> > That trait got added to the QUERY style IOCTLs exactly to address the
> > requirement for low-overhead feature checking by Mesa folks back
> > then.
> >
> > So you execute all the validity checks, but avoid actual data
> > copying.
>
> There is a kernel context switch cost to consider.
> Also, because this only checks one feature, we would eventually need
> multiple uAPI calls to check all the features support.
This discussion has also been covered to the maximum in the past.
It'll hardly keep the driver simple if we are adding a more complex
multiplexer IOCTL for basically everything in order to save in CPU
context switch time.
If this becomes a measurable issue, it's better to look into something
io_uring style to allow N IOCTL to be executed in batch. There have been
prototypes and discussions around this dating years back. I don't think
anyone has yet presented the concrete measurement data that we're
actually CPU context switch bound in probing the system capabilities.
If we have such data, I'd rather we spend the time to fix it for
multiple IOCTLs at once in uring style rather than making every IOCTL
into unnecessarily complex multiplexer call.
> We could avoid this by adding a single flag to an existing query,
> allowing us to check support for all features in a single uAPI call.
This is specifically called out in the discussion in [1] as a no-no.
What the userspace thinks they are checking here based on name is if
the running kernel is new enough to have VM_GET_PROPERTY_FAULTS
property to query, and instead they get information if the running
SKU supports USM.
There's two problems with that, it will only indirectly indicate if
non-zero faults will be reported in the *current* state of the codebase.
There's no reason why USM/SVM capability of SKU would ultimately decide
if page-faults can happen. It's just how things happen to be today.
Second, there's the tendency for such uAPIs to become implicitly used
to infer different information about the SKU. So in the future, if
has_usm and faults get detangled in the codebase, we'd have to add
DRM_XE_QUERY_CONFIG_FLAG_HAS_VM_GET_PROPERTY_FAULTS_REALLY_THIS_TIME
to indicate the availability of faults. The old one would have to remain
in the current state to actually mean "DRM_XE_QUERY_CONFIG_FLAG_HAS_USM"
because changing it might break existing userspace.
I'd rather we would avoid running into same uAPI design problems that
we've struggled in the past.
> > Quoting José Roberto de Souza (2026-06-17 18:25:00)
<SNIP>
> > > +++ b/drivers/gpu/drm/xe/xe_query.c
> > > @@ -354,6 +354,10 @@ static int query_config(struct xe_device *xe,
> > > struct drm_xe_device_query *query)
> > >
> > > DRM_XE_QUERY_CONFIG_FLAG_HAS_DISABLE_STATE_CACHE_PERF_FIX;
> > > config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
> > > DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT;
> > > + if (xe->info.has_usm) {
> > > + config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
> > > +
> > > DRM_XE_QUERY_CONFIG_FLAG_HAS_VM_GET_PROPERTY_FAULTS;
> > > + }
> >
> > I think it kind of proves the validness of concerns in [1] that the
> > has_usm check you add here doesn't have much relevance to the actual
> > VM_GET_PROPERTY_FAULTS IOCTL being available or functional.
> >
> > If you think the VM_GET_PROPERTY_FAULTS should be gated by the
> > has_usm
> > condition, then you should have updated the actual IOCTL, no?
>
> VM_GET_PROPERTY_FAULTS call would work in any platform but would only
> only return fault information in platforms with has_usm support.
That's just how the codebase is today, there's no inherent dependence
here when approached from GPU/EU debugging angle of things.
> > The whole point of the original discussions resulting in [1] was that
> > it is such an easy error to make the capability flag check and the
> > actual IOCTL availability differ that it should be avoided.
>
> Someone just reading xe_drm.h would understand that they need to check
> for DRM_XE_QUERY_CONFIG_FLAG_HAS before using a feature.
The proposed name is literally about HAS_VM_GET_PROPERTY_FAULTS, not
HAS_USM or HAS_FAULTS. So reasonable developer is bound to assume it's
really about the question of the availability of that property.
And for availability the QUERY IOCTL idiom already has a solution in it.
> Without that,
> a person lacking historical context might blindly call
> VM_GET_PROPERTY_FAULTS, unaware that some KMD versions don't even
> support it.
If that is the problem, I think we'd be better off by documenting how
IOCTL based feature detection works, as that has been the de-facto
method for DRM subsystem for years as explained in [1], too.
Regards, Joonas
> > The fact that such an error is in this very patch still after reading
> > through the original reasoning from Sima kind of underlines why the
> > point still stands strongly.
> >
> > Regards, Joonas
> >
> > [1] https://www.spinics.net/lists/intel-gfx/msg325342.html