Hey Chris, let me chime in.

Le 14/06/2023 à 08:26, Christopher James Halse Rogers a écrit :
There's an Jammy/Lunar adsys SRU¹ in the queue at the moment, and I think it needs bringing to up to the list for discussion.

The changelog looks like approximately 9 months of normal feature development. The diff against Jammy is >3MB in size (due largely to significant vendored-dependency churn it seems). The relevant part of SRU policy - “Other safe cases”² - allowing feature addition, says “If existing software needs to be modified to make use of the new feature, it must be demonstrated that these changes are unintrusive, have a minimal regression potential, and have been tested properly”. It looks like adsys is well tested, but I'm not sure about these being minimal changes or with minimal regression potential ☺.

It's true that we've done a wholesale backport of adsys 0.9.2³ to Jammy in the past; however, in that case the changes were mostly listed bugfixes or FTBFS fixes, and the feature addition was shipping a *Windows* binary.

I'm writing this to ubuntu-release@ for two main reasons:

1. It seems valuable to include adsys updates in LTS releases; however, I'm not sure that the scope of changes (and seeming criticality of the system - “failures might prevent users from logging in” seems pretty bad) falls under the existing delegation of power from the Tech Board to the SRU team.

Unfortunately, like many projects, there is a constant tension between the request for new features backport (adsys, as being an enterprise product, only really makes sense in a LTS context) and bug fixes. Most of the new features are developed due to industry requirements, which are: - evolution of their own security practices (for instance, certificates support) - request for other platform supports (winbind in addition to already-existing sssd)

Due to our very limited team capacity, already max-ed out and being split between many projects on different themes, our only way to have a good adsys support, while answering the two previous requirements is to support only one single code base version, meaning, shipping the same code base in all supported releases. As most of the dependencies are vendored (apart from some limited dynamic C linking or dep on samba/sssd for instance), we are in control of what we ship and know exactly what’s our quality base is on it (more details on that in the next paragraphs).

2. There's a *lot* of vendored code churn, and from the SRU perspective I have no information as to whether that's appropriate. I understand that the Go ecosystem does not follow our ideas of stable releases and there's a real tension here - it's a huge amount of work to vet dependency updates, and such updates are *likely* to include bug fixes. I don't think “we just update all our vendored dependencies each SRU to whatever upstream is most recently shipping” is an appropriate standard, though. I'm not sure what *is* the right balance, though.

Right, but also, you need to take into consideration the following:

- as we are vendoring dependencies, accepted as part of the MIR process, it means that we, as upstream, takes the responsibility in front of the security team to handle security fixes inside those dependencies. Most of the security fixes in the various dependencies comes only with new upstream "release" (even if in the Go ecosystem, this is mostly a tag). FYI, the Rust ecosystem is following the same pattern and the vendoring exception is allowed for it too. - as we took that responsability of vendoring, and updating them, it means that we need to do that work as part of the SRU process too. - however, due to the very, very, limited team capacity mentioned above, we need to pick our battle and supporting a "single code base" (including vendored dependencies) is the only way we can go.

So, with that amount of diff, how do we ensure we can ship something we trust and that we are not impacted by any kind of regressions?

1. This can only be done by automated tests.
As of today, I count 1557 automated tests on the adsys repository alone. Those are unit/package/integration tests, using golden files to project exactly the desired expected for each tests on the file system: https://github.com/ubuntu/adsys/tree/main/cmd/adsysd/integration_tests/testdata/TestPolicyUpdate/golden/current_user%2C_first_time.

All those are run against the exact same versions of vendored dependencies and Go version that is going to be built against in the distro on our CI, even when we automatically update one of the vendored dependency: https://github.com/ubuntu/adsys/actions/runs/5257398861

We run those tests with **and** without built-in Go race detector. Also, we are testing untrusted inputs (like the Windows Active Directory GPO utf16 little-endian input) with fuzz testing, and we already fixed some crashes with it, like https://github.com/ubuntu/adsys/pull/333.

2. All the changes are reviewed by a peer (or developed with pair programming sessions), which ensure that everything that entered is carefully tested and review.

The only gap I can identify right now are on the end to end tests:
- Maybe the Windows AD controller changes and this has an impact on us (on this one, quite unlikely as Active Directory is decades old and doesn’t seem to have major changes anymore). - Samba/sssd/kerberos can change from one version of Ubuntu to another and impacts us, as we are reusing part of their outcome as fixtures. We are covering this with - unfortunately - manual end to end testing for every SRU or upload to the current development version. We are aiming (and have a Jira Epic we drafted this cycle) to start having that automated. It’s a complex environment because we need some Windows servers alongside our Ubuntu machines, those end user tests needs to reboot our machines multiple times, change some configuration on the Windows side to reflect on the Ubuntu one and so on.

This is why we covered that part with manual testing as a stop gap solution, which is to ensure that 3rd party, non vendorizable, components of the systems, are still functioning correctly. However, it doesn't protect the opposite: an upload of samba breaking us, which happened in the development version for instance where a 10 years old vendorized heimdal samba code was updated in one shot in lunar dev release. Good luck to find the regression between thousands of commits! We have lost hours on this. So updating vendored dependencies as fast as possible helps reducing this issue IMHO as we do in adsys rather than increasing as in the samba case. This is why we need to have our automated end to end tests to ship with even more confidence and less manual intervention, but this requires also networking between multiple OS and machines, and we need autopkgtests enhancements for this.

I think that should shed some lights on how we ensure a high quality level. This project is shipped and used in different enterprise environments, and I can say that if you compare the volume of usage having big names, compared to the amount of bug reported (most of them are either feature requests or gardening work opened by us to keep our code base modern: https://github.com/ubuntu/adsys/issues and https://bugs.launchpad.net/ubuntu/+source/adsys), even after major SRUs like the one you mentioned, we don’t have to do emergency fixes. This is giving us trust and confidence that our coding practices and processes, are supporting us in delivering high quality software despite all the constraints I mentioned above.

As a more general topic, I don’t think the SRU team (as the MIR team) is in position in terms of time (not being a full-time team) or even knowledge, to really understand every diff entering the distribution itself. (I have the same opinion when we enter the distro freeze and the release team review each diff). So, I see those teams roles more about assessing impact/risk of a change and how much trust there is in upstream to be proactive in term of quality or reactive in term of any issue that arose.


So, in summary: I have two questions - does this exceed SRU authority, and need Tech Board approval, and what level of justification is there for wide ranging vendored code updates in the SRU?.

I think one way forward is for adsys to file up the Special documented cases with all the information above and enter the list where we trust and ensure that upstream is accountable for the SRU? https://wiki.ubuntu.com/StableReleaseUpdates#Documentation_for_Special_Cases

Thanks for considering it,
Didier

--
Ubuntu-release mailing list
[email protected]
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-release

Reply via email to