On 2/16/26 11:58, Matt Coster wrote: > On 16/02/2026 10:11, Thorsten Leemhuis wrote: > > We're currently trying to force this issue to reproduce on hardware we > have on hand; we'd like to see it fixed properly as much as anyone.
Yeah, no worries, I never doubted that. But getting things properly fixed can mean "revert, fix, reapply" when it comes to regressions in Linux -- which is something that should not be seen as something bad, as Linus said himself (see below)! > From our side at least, I don't believe this is a regression at all. In the end what matters is: some change afaics caused systems to not work anymore that used to be working -- that makes it a regression my the Linux kernels standards. And those by the same standards must be fixed, ideally quickly. Find a few quotes on that from Linus below that explains this better. Ciao, Thorsten --- On how quickly regressions should be fixed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * From `2026-01-22 <https://lore.kernel.org/all/CAHk-=wheqniw_wthgo7bkkt7uib-p+ai2jp9m+z+fycz6ca...@mail.gmail.com/>`_:: But a user complaining should basically result in an immediate fix - possibly a "revert and rethink". With a later clarification on `2026-01-28 <https://lore.kernel.org/all/cahk-%3dwi86aosxs66-yi54%2bmpqjpu0upxb8zafg%[email protected]/>`_:: It's also worth noting that "immediate" obviously doesn't mean "right this *second* when the problem has been reported". But if it's a regression with a known commit that caused it, I think the rule of thumb should generally be "within a week", preferably before the next rc. * From `2023-04-21 <https://lore.kernel.org/all/CAHk-=wgD98pmSK3ZyHk_d9kZ2bhgN6DuNZMAJaV0WTtbkf=r...@mail.gmail.com/>`_:: Known-broken commits either (a) get a timely fix that doesn't have other questions or (b) get reverted * From `2021-09-20(2) <https://lore.kernel.org/all/CAHk-=wgovmtrw1tnbmc1rn5yqytkyn0hz+sc4k0dgnn++u9...@mail.gmail.com/>`_:: [...] review shouldn't hold up reported regressions of existing code. That's just basic _testing_ - either the fix should be applied, or - if the fix is too invasive or too ugly - the problematic source of the regression should be reverted. Review should be about new code, it shouldn't be holding up "there's a bug report, here's the obvious fix". * From `2023-05-08 <https://lore.kernel.org/all/CAHk-=wgzU8_dGn0Yg+DyX7ammTkDUCyEJ4C=nvnhrhxkwc7...@mail.gmail.com/>`_:: If something doesn't even build, it should damn well be fixed ASAP. On how fixing regressions with reverts can help prevent maintainer burnout ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * From `2026-01-28 <https://lore.kernel.org/all/cahk-%3dwi86aosxs66-yi54%2bmpqjpu0upxb8zafg%[email protected]/>`_:: > So how can I/we make "immediate fixes" happen more often without > contributing to maintainer burnout? [...] the "revert and rethink" model [...] often a good idea in general unless there's just an obvious fix for an obvious bug [...] Exactly so that maintainers don't get stressed out over having a pending problem report that people keep pestering them about. I think people are sometimes a bit too bought into whatever changes they made, and reverting is seen as "too drastic", but I think it's often the quick and easy solution for when there isn't some obvious response to a regression report. On why the "no regressions" rule exists ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * From `2026-01-22 <https://lore.kernel.org/all/CAHk-=wheqniw_wthgo7bkkt7uib-p+ai2jp9m+z+fycz6ca...@mail.gmail.com/>`_:: But the basic rule is: be so good about backwards compatibility that users never have to worry about upgrading. They should absolutely feel confident that any kernel-reported problem will either be solved, or have an easy solution that is appropriate for *them* (ie a non-technical user shouldn't be expected to be able to do a lot). Because the last thing we want is people holding back from trying new kernels. * From `2024-05-28 <https://lore.kernel.org/all/CAHk-=wgtb7y-beh7tpdvdwru7zkq8-kmjz53tsk37zsppdw...@mail.gmail.com/>`_:: I introduced that "no regressions" rule something like two decades ago, because people need to be able to update their kernel without fear of something they relied on suddenly stopping to work. * From `2018-08-03 <https://lore.kernel.org/all/CA+55aFwWZX=cxmwdtkdgb36kf12xmtehmqjbimpcqcrg2hi...@mail.gmail.com/>`_:: The whole point of "we do not regress" is so that people can upgrade the kernel and never have to worry about it. [...] Because the only thing that matters IS THE USER. * From `2017-10-26(1) <https://lore.kernel.org/lkml/ca+55afxw7nmamvyhkvz1upbutujewrt6yb51qax5rtrwowj...@mail.gmail.com/>`_:: If the kernel used to work for you, the rule is that it continues to work for you. [...] People should basically always feel like they can update their kernel and simply not have to worry about it. I refuse to introduce "you can only update the kernel if you also update that other program" kind of limitations. If the kernel used to work for you, the rule is that it continues to work for you. On exceptions to the "no regressions" rule ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * From `2026-01-22 <https://lore.kernel.org/all/CAHk-=wheqniw_wthgo7bkkt7uib-p+ai2jp9m+z+fycz6ca...@mail.gmail.com/>`_:: There are _very_ few exceptions to that rule, the main one being "the problem was a fundamental huge and gaping security issue and we *had* to make that change, and we couldn't even make your limited use-case just continue to work". The other exception is "the problem was reported years after it was introduced, and now most people rely on the new behavior". [...] Now, if it's one or two users and you can just get them to recompile, that's one thing. Niche hardware and odd use-cases can sometimes be solved that way, and regressions can sometimes be fixed by handholding every single reporter if the reporter is willing and able to change his or her workflow. * From `2023-04-20 <https://lore.kernel.org/all/CAHk-=wis_qqy4odnynnki5b7qhosmxtoj1jxo5wmb6sruwq...@mail.gmail.com/>`_:: And yes, I do consider "regression in an earlier release" to be a regression that needs fixing. There's obviously a time limit: if that "regression in an earlier release" was a year or more ago, and just took forever for people to notice, and it had semantic changes that now mean that fixing the regression could cause a _new_ regression, then that can cause me to go "Oh, now the new semantics are what we have to live with". * From `2021-09-20(3) <https://lore.kernel.org/all/CAHk-=wi7db2sj-wngvvsj7ak2cm556q8437soxo4ejt2bwp...@mail.gmail.com/>`_:: Yes, we have situations where even regressions don't matter - like major security issues that simply cannot be fixed other ways, because the regression _was_ the security hole. * From `2017-10-26(2) <https://lore.kernel.org/lkml/ca+55afxw7nmamvyhkvz1upbutujewrt6yb51qax5rtrwowj...@mail.gmail.com/>`_:: There have been exceptions, but they are few and far between, and they generally have some major and fundamental reasons for having happened, that were basically entirely unavoidable, and people _tried_hard_ to avoid them. Maybe we can't practically support the hardware any more after it is decades old and nobody uses it with modern kernels any more. Maybe there's a serious security issue with how we did things, and people actually depended on that fundamentally broken model. Maybe there was some fundamental other breakage that just _had_ to have a flag day for very core and fundamental reasons. On accepting when a regression occurred ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * From `2026-01-22 <https://lore.kernel.org/all/CAHk-=wheqniw_wthgo7bkkt7uib-p+ai2jp9m+z+fycz6ca...@mail.gmail.com/>`_:: But starting to argue about users reporting breaking changes is basically the final line for me. I have a couple of people that I have in my spam block-list and refuse to have anything to do with, and they have generally been about exactly that. Note how it's not about making mistakes and _causing_ the regression. That's normal. That's development. But then arguing about it is a no-no. * From `2024-06-23 <https://lore.kernel.org/all/CAHk-=wi_KMO_rJ6OCr8mAWBRg-irziM=t9wxgc+j1vvoqb3...@mail.gmail.com/>`_:: We don't introduce regressions and then blame others. There's a very clear rule in kernel development: things that break other things ARE NOT FIXES. EVER. They get reverted, or the thing they broke gets fixed. * From `2021-06-05 <https://lore.kernel.org/all/CAHk-=wiuvqhn76yuwhkjzzwtdjmmjf_zn4+u7vejjmegh3r...@mail.gmail.com/>`_:: THERE ARE NO VALID ARGUMENTS FOR REGRESSIONS. Honestly, security people need to understand that "not working" is not a success case of security. It's a failure case. Yes, "not working" may be secure. But security in that case is *pointless*. * From `2017-10-26(5) <https://lore.kernel.org/lkml/CA+55aFwiiQYJ+YoLKCXjN_beDVfu38mg=ggg5lfocqhe8qi...@mail.gmail.com/>`_:: [...] when regressions *do* occur, we admit to them and fix them, instead of blaming user space. The fact that you have apparently been denying the regression now for three weeks means that I will revert, and I will stop pulling apparmor requests until the people involved understand how kernel development is done. On back-and-forth ~~~~~~~~~~~~~~~~~ * From `2024-05-28 <https://lore.kernel.org/all/CAHk-=wgtb7y-beh7tpdvdwru7zkq8-kmjz53tsk37zsppdw...@mail.gmail.com/>`_:: The "no regressions" rule is that we do not introduce NEW bugs. It *literally* came about because we had an endless dance of "fix two bugs, introduce one new one", and that then resulted in a system that you cannot TRUST. * From `2021-09-20(1) <https://lore.kernel.org/all/CAHk-=wi7db2sj-wngvvsj7ak2cm556q8437soxo4ejt2bwp...@mail.gmail.com/>`_:: And the thing that makes regressions special is that back when I wasn't so strict about these things, we'd end up in endless "seesaw situations" where somebody would fix something, it would break something else, then that something else would break, and it would never actually converge on anything reliable at all. * From `2015-08-13 <https://lore.kernel.org/all/ca+55afxk8-bsikwr_s-c+4g6wihkpqvmle34h9wozpeua6w...@mail.gmail.com/>`_:: The strict policy of no regressions actually originally started mainly wrt suspend/resume issues, where the "fix one machine, break another" kind of back-and-forth caused endless problems, and meant that we didn't actually necessarily make any forward progress, just moving a problem around. On regressions caused by bugfixes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * From `2018-08-03 <https://lore.kernel.org/all/CA+55aFwWZX=cxmwdtkdgb36kf12xmtehmqjbimpcqcrg2hi...@mail.gmail.com/>`_:: > Kernel had a bug which has been fixed That is *ENTIRELY* immaterial. Guys, whether something was buggy or not DOES NOT MATTER. [...] It's basically saying "I took something that worked, and I broke it, but now it's better". Do you not see how f*cking insane that statement is?
