I'd say we'll deal with any CI issues discretely and individually once we
have the list of driver maintainers and CI job maintainers. There may be a
sense of urgency, but I don't think we need a specific time limit.

Marek

On Wed, May 20, 2026, 01:36 Iago Toral <[email protected]> wrote:

> El jue, 14-05-2026 a las 19:13 -0400, Marek Olšák escribió:
> > Here's a more detailed description of the problem and a possible
> > solution.
> >
> > First, the worst case scenario: A small one-line commit that’s
> > correct
> > and trivial causes a test failure in the CI. The maintainer of the
> > affected driver is asked for help, who concludes that it’s likely a
> > HW
> > bug and is forwarded to the HW team of the corresponding GPU company.
> > Now the management of the GPU company has to allocate staff to
> > investigate the failure. 3 months later, we may have a workaround. Or
> > not.
> >
> > Second, the scale: The CI has lots of undocumented devices with
> > undocumented erratas and drivers with hacks and incomplete
> > implementations. (that’s normal for any project) Any of those devices
> > can fail at any time for reasons that might not make sense, and any
> > of
> > the drivers can fail for random reasons too. It’s not fair to ask the
> > contributor to keep everything conformant at every MR. Even if the
> > devices were documented with open source implementations (e.g. uarch
> > specs, HDL, RTL) and well documented drivers, it’s not reasonable to
> > ask the contributor to study them all.
> >
> > Thus, we can’t expect the contributor to be solely responsible for
> > conformance of all devices at every MR in main.
> >
> > It’s useful to keep drivers that have regular contributors conformant
> > at most commits in main, but why do we need to keep drivers without
> > contributors conformant? If somebody cares about those drivers but
> > not
> > enough to contribute in main, they can contribute fixes during the RC
> > window or on their own schedule.
> >
> > We need a two-tier system:
> >
> > Tier 1:
> > - Devices are tested by the CI pre-merge.
> > - A contact person is required for CI failure assessment and closure
> > within a reasonable time. (if the person is on leave, a backup person
> > must be available, or else the device is moved to Tier 2)
>
>
> I think this makes sense, but we need to agree on what "reasonable
> time" means to make sure everyone is on the same page.
>
> Iago
>
> > - Highly recommended: A fully functional drm-shim for each CI job
> > with
> > a user guide, how to print compiled shaders, etc.
> > - Links to HW documentation if available.
> > - If maintainers end up xfailing a significant number of failures
> > regularly, the device is moved to Tier 2. (due to not using the CI to
> > maintain conformance)
> >
> > Tier 2:
> > - Pre-merge CI can’t run on the target devices / implementations.
> > main
> > doesn’t have to work. The quality of release branches is up to
> > maintainers. The RC window can be extended.
> > - Only unit tests can run per-merge, as well as any deviceless driver
> > tests, like the following.
> > - Optionally develop deviceless driver validation tests that verify
> > driver output (shader instructions, command buffers). LLVM LIT tests
> > are the perfect example - they validate all LLVM backends and prevent
> > regressions without any physical devices.
> >
> >
> > Marek
> >
> > On Fri, May 1, 2026 at 5:21 AM Daniel Stone <[email protected]>
> > wrote:
> > >
> > > Hi,
> > >
> > > On Thu, 30 Apr 2026 at 23:34, Timur Kristóf
> > > <[email protected]> wrote:
> > > > On 2026. április 30., csütörtök 23:07:12 közép-európai nyári idő
> > > > Marek Olšák
> > > > wrote:
> > > > > First of all, no contributor to shared code is required to fix
> > > > > issues
> > > > > in all drivers that their commit breaks. The goal is to stop
> > > > > using the
> > > > > pre-merge CI as a justification to force unrelated contributors
> > > > > to
> > > > > work on all drivers just because they are contributors. It
> > > > > would be a
> > > > > bit exploitative to assume that every contributor must debug
> > > > > all
> > > > > drivers that turn red due to a change. I think I understand
> > > > > that well
> > > > > because I have debugged 5+ drivers by myself in the past that
> > > > > are not
> > > > > my responsibility to maintain, and it does feel exploitative.
> > >
> > > There's a bit more nuance in this though. If one set of people is
> > > breaking 17 drivers every day because they can't be bothered to do
> > > the
> > > basics to keep things working and just want to yolo whatever they
> > > just
> > > thought of into the tree, it's 'unethical' and unfair on the rest
> > > of
> > > the people who then spend their entire time bisecting and fixing up
> > > what the others broke. (Those people then probably get accused of
> > > being freeloaders and exploiting the labour of the people breaking
> > > everything, because they don't get to spend any time on fun new
> > > stuff,
> > > given all their time is spent fixing what the others broke.)
> > >
> > > I think we've all taken it as axiomatic that there's a balance to
> > > be
> > > struck there: don't make others miserable because you can't be
> > > bothered spending five minutes thinking about why your new code
> > > breaks
> > > existing users, but on the other hand you absolutely should expect
> > > support from the relevant people to help work it out and resolve
> > > it.
> > >
> > > I'm pretty sure no-one is suggesting ripping up that social
> > > contract,
> > > but we should be clear about what we mean.
> > >
> > > > > Therefore, we could establish that each driver/HW combo in pre-
> > > > > merge
> > > > > CI has the following options:
> > > > > 1) a contact person for prompt CI issue resolution
> > > > > 2) unconditional xfail by the author (or removal from pre-merge
> > > > > CI if
> > > > > logs lack the information necessary to add xfail)
> > > >
> > > > I think we should establish both of those, in that order.
> > > > That is, if the contact person does not reply promptly, just
> > > > let's add the
> > > > expected failure.
> > >
> > > Yeah, that's a pretty obvious baseline. So far it seems to have
> > > worked
> > > out in the usual way (people know who works on what so it's easy to
> > > ping them however), but if that's not working out, maybe someone
> > > could
> > > suggest a more formal document along the lines of MAINTAINERS or
> > > CODEOWNERS or whatever?
> > >
> > > Cheers,
> > > Daniel
> >
>
>

Reply via email to