Re: [BUG v2] common/domctl: xsm update for get_domain_state access

Daniel P. Smith Thu, 19 Feb 2026 04:16:07 -0800

On 2/18/26 18:04, Jason Andryuk wrote:

On 2026-02-18 14:08, Daniel P. Smith wrote:

When using XSM Flask, passing DOMID_INVALID will result in a NULL pointer
reference from the passing of NULL as the target domain to

xsm_get_domain_state(). Simply not invoking xsm_get_domain_state()when the

target domain is NULL opens the opportunity to circumvent the XSM
get_domain_state access check. This is due to the fact that the call to
xsm_domctl() for get_domain_state op is a no-op check, deferring to
xsm_get_domain_state().


Modify the helper get_domain_state() to ensure the requesting domain has

get_domain_state access for the target domain, whether the targetdomain isexplicitly set or implicitly determined with a domain state search. Inthe caseof access not being allowed for a domain found during an implicitsearch, the

search will continue to the next domain whose state has changed.

Fixes: 3ad3df1bd0aa ("xen: add new domctl get_domain_state")
Reported-by: Chris Rogers <[email protected]>
Reported-by: Dmytro Firsov <[email protected]>
Signed-off-by: Daniel P. Smith <[email protected]>
---
Changes in v2:
- fix commit message
- init dom as -1
- rework loop logic to use test_and_clear_bit()
---
  xen/common/domain.c | 27 +++++++++++++++++++++------
  xen/common/domctl.c |  7 ++-----
  2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index de6fdf59236e..2ffec331a8d1 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c

@@ -210,7 +210,7 @@ static void set_domain_state_info(structxen_domctl_get_domain_state *info, int get_domain_state(struct xen_domctl_get_domain_state *info,struct domain *d,

                       domid_t *domid)
  {
-    unsigned int dom;
+    unsigned int dom = -1;
      int rc = -ENOENT;
      struct domain *hdl;

@@ -219,6 +219,10 @@ int get_domain_state(structxen_domctl_get_domain_state *info, struct domain *d,

      if ( d )
      {
+        rc = xsm_get_domain_state(XSM_XS_PRIV, d);
+        if ( rc )
+            return rc;
+
          set_domain_state_info(info, d);
          return 0;

@@ -238,28 +242,39 @@ int get_domain_state(structxen_domctl_get_domain_state *info, struct domain *d,


Between the two hunks is this:

     hdl = lock_dom_exc_handler();

     /*
      * Only domain registered for VIRQ_DOM_EXC event is allowed to query
      * domains having changed state.
      */
     if ( current->domain != hdl )
     {
         rc = -EACCES;
         goto out;
     }

So it is only the domain with VIRQ_DOM_EXC that can enter the while loop:

      while ( dom_state_changed )
      {
-        dom = find_first_bit(dom_state_changed, DOMID_MASK + 1);
+        dom = find_next_bit(dom_state_changed, DOMID_MASK + 1, dom + 1);
          if ( dom >= DOMID_FIRST_RESERVED )
              break;
+
+        d = rcu_lock_domain_by_id(dom);
+        if ( d && xsm_get_domain_state(XSM_XS_PRIV, d) )


... if the VIRQ_DOM_EXC owner is denied for domain d ...

+        {
+            rcu_unlock_domain(d);
+            d = NULL;
+            continue;


... the caller would continue ...

+        }
+
          if ( test_and_clear_bit(dom, dom_state_changed) )

... and this bit would never be cleared. Should the VIRQ_DOM_EXC owneralways get to clear the bit even if it cannot see the result? Is thistoo much of a corner case to care about?

If we want to go down this discussion path, here is yet another exampleaccess control decisions are getting codified directly againstproperties of a domain, in particular against VIRQ's which were waivedfrom XSM control. If having access to certain VIRQ's is going to be aprivilege determination, then perhaps we should visit whether theyshould be labeled objects like physical IRQs.

The patch generally seems okay aside from this.
          {
              *domid = dom;
-            d = rcu_lock_domain_by_id(dom);
-
              if ( d )
              {
                  set_domain_state_info(info, d);
-
                  rcu_unlock_domain(d);
              }
              else
                  memset(info, 0, sizeof(*info));
              rc = 0;
-
              break;
          }
+
+        if ( d )
+        {
+            rcu_unlock_domain(d);
+            d = NULL;
+        }
      }
   out:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 29a7726d32d0..2eedc639c72a 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -860,12 +860,9 @@ longdo_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
          break;
      case XEN_DOMCTL_get_domain_state:
-        ret = xsm_get_domain_state(XSM_XS_PRIV, d);
With the initial NULL deref issue, I wondered if this wouldn't be betteroff modified to drop the d param - xsm_get_domain_state(XSM_XS_PRIV) -and changing flask permissions like:
allow dom0_t xen_t:xen get_domain_state;
That would gate the external call, and individual domain permissionscould be checked with xsm_getdomaininfo(), or a new hook if you don'twant to re-use.
But as your approach avoids passing NULL, it seems okay to me. It alsodoesn't change the flask policy, which is nice.

That's a plain nack from me. Whether it is viewed as a pipe dream ornot, my goal continues to be to work towards enabling the ability tohave a truly disaggregated platform. In the original architecture, itwas envisioned to have multiple toolstack domains, each responsible fora distinct set of domains. In terms of implementation, that would meanmultiple xenstored instances, each with a purview over a subset of domains.


v/r,
dps

Re: [BUG v2] common/domctl: xsm update for get_domain_state access

Reply via email to