On Thu, Dec 11, 2025 at 11:43:13PM +0900, Simon Richter wrote:
> Hi,
> 
> (cc Steve Wahl because UV BIOS is tangentially involved)
> (cc Ville Syrjala because of the recent thread about VGA arbiter)
> 
> I'm looking into
> 
>      https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1824
> 
> again because I have a similar problem.
> 
> In pci_set_vga_state, we traverse the PCI bridges upwards, and set the 
> VGA forwarding bit on all of them up to the root. Now I happen to have a 
> root bridge that refuses to set this bit, so if I read it back after 
> setting it, it is still unset.
> 
> TTBOMK, that is allowed in the PCI specification, and the correct way to 
> indicate that a bridge cannot forward VGA accesses.
> 
> The smallest possible change is to read back the bridge control 
> register, and if the bit is still unset, terminate the loop and return 
> an error.
> 
> I'm trying to find out now if this is a good idea, and am a bit puzzled:
> 
> It appears the only place that can generate errors is 
> pci_set_vga_state_arch, which is a dispatch mechanism with a single 
> user, uv_bios_set_legacy_vga_target.
> 
> According to the comment, the errors generated here are -EINVAL, -ENOSYS 
> and -EBUSY. The implementation returns the return value from an EFI 
> call, which appears to be defined to be a negated POSIX errno number, so 
> this has a high chance of being correct.
> 
> If pci_set_vga_state_arch does not return an error, the normal 
> pci_set_vga_state function runs, and there is no way for the arch 
> specific function to indicate that the rest of the function should not 
> run. Is that intentional, or should the BIOS call replace the normal 
> implementation?
> 
> It also seems that vgaarb is the only caller, and it does not look at 
> the return value from this function. So if I fix that, and propagate the 
> error upwards, I first need a way to indicate that __try_vga_get failed 
> without there being a conflict, and then I need all vga_get callers to 
> have error handling. The latter sounds doable.
> 
> What I'm unsure about:
> 
> 1. Does this seem like a viable approach?
> 
> 2. What interface makes sense for returning an error here? This function 
> is supposed to return the conflicting device -- should I just make this 
> a PTR_ERR?
> 
> 3. What error would be appropriate for a bridge refusing to activate the 
> VGA bit? It's not EIO, it's not EACCES, it's not EINVAL, the closest is 
> probably ENOSYS, but even that feels wrong.
> 
> 4. How likely is it for this to break something else? Are there PCI 
> bridges that forward VGA but implement the bridge control register 
> incorrectly?
> 
>     Simon

I wonder how bad it would be to just tickle the VGA bit
when the bridge is added. Basically something like:

if (BRIDGECTL.VGA) {
        bridge.has_vga = true;
} else {
        BRIDGECTL.VGA = 1
        if (BRIDGECTL.VGA)
                bridge.has_vga = true;
        BRIDGECTL.VGA = 0
}

Obviously there is a tiny race there where the bridge might
temporarily forward VGA accesses to the wrong device. But
maybe the race is small enough to not really matter? Or I
guess if you're really worried about the race you could do
this check under stop_machine() *shudder*.

BTW I recently posted this:
https://lore.kernel.org/intel-gfx/[email protected]/
where I'm trying to fix the mess around VGA accesses in i915/xe.
Some of the more hacky things there should probably be properly
fixed in vgaarb, but I don't really have the time/energy to
deal with that right now.

-- 
Ville Syrjälä
Intel

Reply via email to