El jue, 14-05-2026 a las 19:13 -0400, Marek Olšák escribió: > Here's a more detailed description of the problem and a possible > solution. > > First, the worst case scenario: A small one-line commit that’s > correct > and trivial causes a test failure in the CI. The maintainer of the > affected driver is asked for help, who concludes that it’s likely a > HW > bug and is forwarded to the HW team of the corresponding GPU company. > Now the management of the GPU company has to allocate staff to > investigate the failure. 3 months later, we may have a workaround. Or > not. > > Second, the scale: The CI has lots of undocumented devices with > undocumented erratas and drivers with hacks and incomplete > implementations. (that’s normal for any project) Any of those devices > can fail at any time for reasons that might not make sense, and any > of > the drivers can fail for random reasons too. It’s not fair to ask the > contributor to keep everything conformant at every MR. Even if the > devices were documented with open source implementations (e.g. uarch > specs, HDL, RTL) and well documented drivers, it’s not reasonable to > ask the contributor to study them all. > > Thus, we can’t expect the contributor to be solely responsible for > conformance of all devices at every MR in main. > > It’s useful to keep drivers that have regular contributors conformant > at most commits in main, but why do we need to keep drivers without > contributors conformant? If somebody cares about those drivers but > not > enough to contribute in main, they can contribute fixes during the RC > window or on their own schedule. > > We need a two-tier system: > > Tier 1: > - Devices are tested by the CI pre-merge. > - A contact person is required for CI failure assessment and closure > within a reasonable time. (if the person is on leave, a backup person > must be available, or else the device is moved to Tier 2)
I think this makes sense, but we need to agree on what "reasonable time" means to make sure everyone is on the same page. Iago > - Highly recommended: A fully functional drm-shim for each CI job > with > a user guide, how to print compiled shaders, etc. > - Links to HW documentation if available. > - If maintainers end up xfailing a significant number of failures > regularly, the device is moved to Tier 2. (due to not using the CI to > maintain conformance) > > Tier 2: > - Pre-merge CI can’t run on the target devices / implementations. > main > doesn’t have to work. The quality of release branches is up to > maintainers. The RC window can be extended. > - Only unit tests can run per-merge, as well as any deviceless driver > tests, like the following. > - Optionally develop deviceless driver validation tests that verify > driver output (shader instructions, command buffers). LLVM LIT tests > are the perfect example - they validate all LLVM backends and prevent > regressions without any physical devices. > > > Marek > > On Fri, May 1, 2026 at 5:21 AM Daniel Stone <[email protected]> > wrote: > > > > Hi, > > > > On Thu, 30 Apr 2026 at 23:34, Timur Kristóf > > <[email protected]> wrote: > > > On 2026. április 30., csütörtök 23:07:12 közép-európai nyári idő > > > Marek Olšák > > > wrote: > > > > First of all, no contributor to shared code is required to fix > > > > issues > > > > in all drivers that their commit breaks. The goal is to stop > > > > using the > > > > pre-merge CI as a justification to force unrelated contributors > > > > to > > > > work on all drivers just because they are contributors. It > > > > would be a > > > > bit exploitative to assume that every contributor must debug > > > > all > > > > drivers that turn red due to a change. I think I understand > > > > that well > > > > because I have debugged 5+ drivers by myself in the past that > > > > are not > > > > my responsibility to maintain, and it does feel exploitative. > > > > There's a bit more nuance in this though. If one set of people is > > breaking 17 drivers every day because they can't be bothered to do > > the > > basics to keep things working and just want to yolo whatever they > > just > > thought of into the tree, it's 'unethical' and unfair on the rest > > of > > the people who then spend their entire time bisecting and fixing up > > what the others broke. (Those people then probably get accused of > > being freeloaders and exploiting the labour of the people breaking > > everything, because they don't get to spend any time on fun new > > stuff, > > given all their time is spent fixing what the others broke.) > > > > I think we've all taken it as axiomatic that there's a balance to > > be > > struck there: don't make others miserable because you can't be > > bothered spending five minutes thinking about why your new code > > breaks > > existing users, but on the other hand you absolutely should expect > > support from the relevant people to help work it out and resolve > > it. > > > > I'm pretty sure no-one is suggesting ripping up that social > > contract, > > but we should be clear about what we mean. > > > > > > Therefore, we could establish that each driver/HW combo in pre- > > > > merge > > > > CI has the following options: > > > > 1) a contact person for prompt CI issue resolution > > > > 2) unconditional xfail by the author (or removal from pre-merge > > > > CI if > > > > logs lack the information necessary to add xfail) > > > > > > I think we should establish both of those, in that order. > > > That is, if the contact person does not reply promptly, just > > > let's add the > > > expected failure. > > > > Yeah, that's a pretty obvious baseline. So far it seems to have > > worked > > out in the usual way (people know who works on what so it's easy to > > ping them however), but if that's not working out, maybe someone > > could > > suggest a more formal document along the lines of MAINTAINERS or > > CODEOWNERS or whatever? > > > > Cheers, > > Daniel >
