On 2/18/26 18:04, Jason Andryuk wrote:
On 2026-02-18 14:08, Daniel P. Smith wrote:
When using XSM Flask, passing DOMID_INVALID will result in a NULL pointer
reference from the passing of NULL as the target domain to
xsm_get_domain_state(). Simply not invoking xsm_get_domain_state() when the
target domain is NULL opens the opportunity to circumvent the XSM
get_domain_state access check. This is due to the fact that the call to
xsm_domctl() for get_domain_state op is a no-op check, deferring to
xsm_get_domain_state().

Modify the helper get_domain_state() to ensure the requesting domain has
get_domain_state access for the target domain, whether the target domain is explicitly set or implicitly determined with a domain state search. In the case of access not being allowed for a domain found during an implicit search, the
search will continue to the next domain whose state has changed.

Fixes: 3ad3df1bd0aa ("xen: add new domctl get_domain_state")
Reported-by: Chris Rogers <[email protected]>
Reported-by: Dmytro Firsov <[email protected]>
Signed-off-by: Daniel P. Smith <[email protected]>
---
Changes in v2:
- fix commit message
- init dom as -1
- rework loop logic to use test_and_clear_bit()
---
  xen/common/domain.c | 27 +++++++++++++++++++++------
  xen/common/domctl.c |  7 ++-----
  2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index de6fdf59236e..2ffec331a8d1 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -210,7 +210,7 @@ static void set_domain_state_info(struct xen_domctl_get_domain_state *info,   int get_domain_state(struct xen_domctl_get_domain_state *info, struct domain *d,
                       domid_t *domid)
  {
-    unsigned int dom;
+    unsigned int dom = -1;
      int rc = -ENOENT;
      struct domain *hdl;
@@ -219,6 +219,10 @@ int get_domain_state(struct xen_domctl_get_domain_state *info, struct domain *d,
      if ( d )
      {
+        rc = xsm_get_domain_state(XSM_XS_PRIV, d);
+        if ( rc )
+            return rc;
+
          set_domain_state_info(info, d);
          return 0;
@@ -238,28 +242,39 @@ int get_domain_state(struct xen_domctl_get_domain_state *info, struct domain *d,

Between the two hunks is this:

     hdl = lock_dom_exc_handler();

     /*
      * Only domain registered for VIRQ_DOM_EXC event is allowed to query
      * domains having changed state.
      */
     if ( current->domain != hdl )
     {
         rc = -EACCES;
         goto out;
     }

So it is only the domain with VIRQ_DOM_EXC that can enter the while loop:

      while ( dom_state_changed )
      {
-        dom = find_first_bit(dom_state_changed, DOMID_MASK + 1);
+        dom = find_next_bit(dom_state_changed, DOMID_MASK + 1, dom + 1);
          if ( dom >= DOMID_FIRST_RESERVED )
              break;
+
+        d = rcu_lock_domain_by_id(dom);
+        if ( d && xsm_get_domain_state(XSM_XS_PRIV, d) )

... if the VIRQ_DOM_EXC owner is denied for domain d ...

+        {
+            rcu_unlock_domain(d);
+            d = NULL;
+            continue;

... the caller would continue ...

+        }
+
          if ( test_and_clear_bit(dom, dom_state_changed) )

... and this bit would never be cleared.  Should the VIRQ_DOM_EXC owner always get to clear the bit even if it cannot see the result?  Is this too much of a corner case to care about?

If we want to go down this discussion path, here is yet another example access control decisions are getting codified directly against properties of a domain, in particular against VIRQ's which were waived from XSM control. If having access to certain VIRQ's is going to be a privilege determination, then perhaps we should visit whether they should be labeled objects like physical IRQs.

The patch generally seems okay aside from this.

          {
              *domid = dom;
-            d = rcu_lock_domain_by_id(dom);
-
              if ( d )
              {
                  set_domain_state_info(info, d);
-
                  rcu_unlock_domain(d);
              }
              else
                  memset(info, 0, sizeof(*info));
              rc = 0;
-
              break;
          }
+
+        if ( d )
+        {
+            rcu_unlock_domain(d);
+            d = NULL;
+        }
      }
   out:
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 29a7726d32d0..2eedc639c72a 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -860,12 +860,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
          break;
      case XEN_DOMCTL_get_domain_state:
-        ret = xsm_get_domain_state(XSM_XS_PRIV, d);

With the initial NULL deref issue, I wondered if this wouldn't be better off modified to drop the d param - xsm_get_domain_state(XSM_XS_PRIV) - and changing flask permissions like:
allow dom0_t xen_t:xen get_domain_state;

That would gate the external call, and individual domain permissions could be checked with xsm_getdomaininfo(), or a new hook if you don't want to re-use.

But as your approach avoids passing NULL, it seems okay to me.  It also doesn't change the flask policy, which is nice.

That's a plain nack from me. Whether it is viewed as a pipe dream or not, my goal continues to be to work towards enabling the ability to have a truly disaggregated platform. In the original architecture, it was envisioned to have multiple toolstack domains, each responsible for a distinct set of domains. In terms of implementation, that would mean multiple xenstored instances, each with a purview over a subset of domains.

v/r,
dps


Reply via email to