Re: adsys SRU

Didier Roche Wed, 14 Jun 2023 01:04:59 -0700

Hey Chris, let me chime in.

Le 14/06/2023 à 08:26, Christopher James Halse Rogers a écrit :

There's an Jammy/Lunar adsys SRU¹ in the queue at the moment, and Ithink it needs bringing to up to the list for discussion.
The changelog looks like approximately 9 months of normal featuredevelopment. The diff against Jammy is >3MB in size (due largely tosignificant vendored-dependency churn it seems). The relevant part ofSRU policy - “Other safe cases”² - allowing feature addition, says “Ifexisting software needs to be modified to make use of the new feature,it must be demonstrated that these changes are unintrusive, have aminimal regression potential, and have been tested properly”. It lookslike adsys is well tested, but I'm not sure about these being minimalchanges or with minimal regression potential ☺.
It's true that we've done a wholesale backport of adsys 0.9.2³ toJammy in the past; however, in that case the changes were mostlylisted bugfixes or FTBFS fixes, and the feature addition was shippinga *Windows* binary.
I'm writing this to ubuntu-release@ for two main reasons:
1. It seems valuable to include adsys updates in LTS releases;however, I'm not sure that the scope of changes (and seemingcriticality of the system - “failures might prevent users from loggingin” seems pretty bad) falls under the existing delegation of powerfrom the Tech Board to the SRU team.

Unfortunately, like many projects, there is a constant tension betweenthe request for new features backport (adsys, as being an enterpriseproduct, only really makes sense in a LTS context) and bug fixes. Mostof the new features are developed due to industry requirements, which are:- evolution of their own security practices (for instance, certificatessupport)- request for other platform supports (winbind in addition toalready-existing sssd)

Due to our very limited team capacity, already max-ed out and beingsplit between many projects on different themes, our only way to have agood adsys support, while answering the two previous requirements is tosupport only one single code base version, meaning, shipping the samecode base in all supported releases. As most of the dependencies arevendored (apart from some limited dynamic C linking or dep on samba/sssdfor instance), we are in control of what we ship and know exactly what’sour quality base is on it (more details on that in the next paragraphs).

2. There's a *lot* of vendored code churn, and from the SRUperspective I have no information as to whether that's appropriate. Iunderstand that the Go ecosystem does not follow our ideas of stablereleases and there's a real tension here - it's a huge amount of workto vet dependency updates, and such updates are *likely* to includebug fixes. I don't think “we just update all our vendored dependencieseach SRU to whatever upstream is most recently shipping” is anappropriate standard, though. I'm not sure what *is* the rightbalance, though.


Right, but also, you need to take into consideration the following:

- as we are vendoring dependencies, accepted as part of the MIR process,it means that we, as upstream, takes the responsibility in front of thesecurity team to handle security fixes inside those dependencies. Mostof the security fixes in the various dependencies comes only with newupstream "release" (even if in the Go ecosystem, this is mostly a tag).FYI, the Rust ecosystem is following the same pattern and the vendoringexception is allowed for it too.- as we took that responsability of vendoring, and updating them, itmeans that we need to do that work as part of the SRU process too.- however, due to the very, very, limited team capacity mentioned above,we need to pick our battle and supporting a "single code base"(including vendored dependencies) is the only way we can go.

So, with that amount of diff, how do we ensure we can ship something wetrust and that we are not impacted by any kind of regressions?


1. This can only be done by automated tests.

As of today, I count 1557 automated tests on the adsys repository alone.Those are unit/package/integration tests, using golden files to projectexactly the desired expected for each tests on the file system:https://github.com/ubuntu/adsys/tree/main/cmd/adsysd/integration_tests/testdata/TestPolicyUpdate/golden/current_user%2C_first_time.

All those are run against the exact same versions of vendoreddependencies and Go version that is going to be built against in thedistro on our CI, even when we automatically update one of the vendoreddependency: https://github.com/ubuntu/adsys/actions/runs/5257398861

We run those tests with **and** without built-in Go race detector. Also,we are testing untrusted inputs (like the Windows Active Directory GPOutf16 little-endian input) with fuzz testing, and we already fixed somecrashes with it, like https://github.com/ubuntu/adsys/pull/333.

2. All the changes are reviewed by a peer (or developed with pairprogramming sessions), which ensure that everything that entered iscarefully tested and review.


The only gap I can identify right now are on the end to end tests:

- Maybe the Windows AD controller changes and this has an impact on us(on this one, quite unlikely as Active Directory is decades old anddoesn’t seem to have major changes anymore).- Samba/sssd/kerberos can change from one version of Ubuntu to anotherand impacts us, as we are reusing part of their outcome as fixtures.We are covering this with - unfortunately - manual end to end testingfor every SRU or upload to the current development version. We areaiming (and have a Jira Epic we drafted this cycle) to start having thatautomated. It’s a complex environment because we need some Windowsservers alongside our Ubuntu machines, those end user tests needs toreboot our machines multiple times, change some configuration on theWindows side to reflect on the Ubuntu one and so on.

This is why we covered that part with manual testing as a stop gapsolution, which is to ensure that 3rd party, non vendorizable,components of the systems, are still functioning correctly. However, itdoesn't protect the opposite: an upload of samba breaking us, whichhappened in the development version for instance where a 10 years oldvendorized heimdal samba code was updated in one shot in lunar devrelease. Good luck to find the regression between thousands of commits!We have lost hours on this. So updating vendored dependencies as fast aspossible helps reducing this issue IMHO as we do in adsys rather thanincreasing as in the samba case. This is why we need to have ourautomated end to end tests to ship with even more confidence and lessmanual intervention, but this requires also networking between multipleOS and machines, and we need autopkgtests enhancements for this.

I think that should shed some lights on how we ensure a high qualitylevel. This project is shipped and used in different enterpriseenvironments, and I can say that if you compare the volume of usagehaving big names, compared to the amount of bug reported (most of themare either feature requests or gardening work opened by us to keep ourcode base modern: https://github.com/ubuntu/adsys/issues andhttps://bugs.launchpad.net/ubuntu/+source/adsys), even after major SRUslike the one you mentioned, we don’t have to do emergency fixes. This isgiving us trust and confidence that our coding practices and processes,are supporting us in delivering high quality software despite all theconstraints I mentioned above.

As a more general topic, I don’t think the SRU team (as the MIR team) isin position in terms of time (not being a full-time team) or evenknowledge, to really understand every diff entering the distributionitself. (I have the same opinion when we enter the distro freeze and therelease team review each diff). So, I see those teams roles more aboutassessing impact/risk of a change and how much trust there is inupstream to be proactive in term of quality or reactive in term of anyissue that arose.

So, in summary: I have two questions - does this exceed SRU authority,and need Tech Board approval, and what level of justification is therefor wide ranging vendored code updates in the SRU?.

I think one way forward is for adsys to file up the Special documentedcases with all the information above and enter the list where we trustand ensure that upstream is accountable for the SRU?https://wiki.ubuntu.com/StableReleaseUpdates#Documentation_for_Special_Cases


Thanks for considering it,
Didier

--
Ubuntu-release mailing list
[email protected]
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-release

Re: adsys SRU

Reply via email to