On Thu, Dec 11, 2025 at 11:43:13PM +0900, Simon Richter wrote:
> Hi,
>
> (cc Steve Wahl because UV BIOS is tangentially involved)
> (cc Ville Syrjala because of the recent thread about VGA arbiter)
>
> I'm looking into
>
> https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1824
>
> again because I have a similar problem.
>
> In pci_set_vga_state, we traverse the PCI bridges upwards, and set the
> VGA forwarding bit on all of them up to the root. Now I happen to have a
> root bridge that refuses to set this bit, so if I read it back after
> setting it, it is still unset.
>
> TTBOMK, that is allowed in the PCI specification, and the correct way to
> indicate that a bridge cannot forward VGA accesses.
>
> The smallest possible change is to read back the bridge control
> register, and if the bit is still unset, terminate the loop and return
> an error.
>
> I'm trying to find out now if this is a good idea, and am a bit puzzled:
>
> It appears the only place that can generate errors is
> pci_set_vga_state_arch, which is a dispatch mechanism with a single
> user, uv_bios_set_legacy_vga_target.
>
> According to the comment, the errors generated here are -EINVAL, -ENOSYS
> and -EBUSY. The implementation returns the return value from an EFI
> call, which appears to be defined to be a negated POSIX errno number, so
> this has a high chance of being correct.
>
> If pci_set_vga_state_arch does not return an error, the normal
> pci_set_vga_state function runs, and there is no way for the arch
> specific function to indicate that the rest of the function should not
> run. Is that intentional, or should the BIOS call replace the normal
> implementation?
>
> It also seems that vgaarb is the only caller, and it does not look at
> the return value from this function. So if I fix that, and propagate the
> error upwards, I first need a way to indicate that __try_vga_get failed
> without there being a conflict, and then I need all vga_get callers to
> have error handling. The latter sounds doable.
>
> What I'm unsure about:
>
> 1. Does this seem like a viable approach?
>
> 2. What interface makes sense for returning an error here? This function
> is supposed to return the conflicting device -- should I just make this
> a PTR_ERR?
>
> 3. What error would be appropriate for a bridge refusing to activate the
> VGA bit? It's not EIO, it's not EACCES, it's not EINVAL, the closest is
> probably ENOSYS, but even that feels wrong.
>
> 4. How likely is it for this to break something else? Are there PCI
> bridges that forward VGA but implement the bridge control register
> incorrectly?
>
> Simon
I wonder how bad it would be to just tickle the VGA bit
when the bridge is added. Basically something like:
if (BRIDGECTL.VGA) {
bridge.has_vga = true;
} else {
BRIDGECTL.VGA = 1
if (BRIDGECTL.VGA)
bridge.has_vga = true;
BRIDGECTL.VGA = 0
}
Obviously there is a tiny race there where the bridge might
temporarily forward VGA accesses to the wrong device. But
maybe the race is small enough to not really matter? Or I
guess if you're really worried about the race you could do
this check under stop_machine() *shudder*.
BTW I recently posted this:
https://lore.kernel.org/intel-gfx/[email protected]/
where I'm trying to fix the mess around VGA accesses in i915/xe.
Some of the more hacky things there should probably be properly
fixed in vgaarb, but I don't really have the time/energy to
deal with that right now.
--
Ville Syrjälä
Intel