1. Introduction This message expands on my remarks at the Cauldron (especially the patch review and maintenance BoF, and the Sourceware infrastructure BoF) regarding desired features for a system providing pull request functionality (patch submission via creating branches that are then proposed using some kind of web interface or API, with a central database then tracking the status of each pull request and review comments thereon automatically), for use by the GNU toolchain (or one or more components thereof - there is no need for each component to make the same decision about moving to such software and workflow, and indeed we have no mechanism to make such decisions for the toolchain as a whole).
This does not advocate a particular choice of software for such functionality (though much of the discussion seemed to suggest Forgejo as the most likely starting point), or a particular choice of where to host it. Hosting would of course need to meet appropriate security requirements, and to achieve a passing grade on the GNU Ethical Repository Criteria, and the software would need to be entirely free software. Where relevant features are not already supported, it's important that the software is receptive to the addition of such features (including cases where part of the functionality is provided by software specific to the GNU toolchain or parts thereof - such as for the custom checks currently implemented with git hooks - and the underlying software provides appropriate interfaces to allow integration of such external pieces). The list of features here may be a good basis for reviewing what particular forge software supports and whether other features can be added, directly or through use of appropriate APIs. Forge software may provide other pieces such as bug tracking or wikis that we currently handle separately from git hosting. In such cases, we should be able to disable those pieces and keep using the existing bug tracking and wiki software (while having the option to decide independently to migrate those if desired). I consider the overall benefits of such a move to be having more structured data about all changes proposed for inclusion and their status (needing review, needing changes from the author, under discussion, needing merge from mainline, etc.), to help all people involved in the patch submission and review process to track such information and to find patches needing review as applicable, along with providing a more familiar workflow for many people that avoids many of the problems with email (which affect experienced contributors working around corporate email systems, not just new contributors). It would not of course by itself turn people with no interest in or understanding of systems software development into contributors (for example, people without knowledge of directories and hierarchical file storage, or people who only understand software development as web development). Nor would it prevent the accumulation of large backlogs of unreviewed patches, as is evident from many large and active projects using PR workflows with large numbers of open PRs. As Richard noted in his BoF, email sucks. As I noted in reply, so do the web and web browsers when trying to deal with large amounts of patch review state (when one wishes to apply one's own view, not the forge's, of what is resolved and what needs attention). As I also noted, in the Sourceware infrastructure BoF, tools such as patchwork and b4 are the right answer to the wrong question: trying to get structured data about patch submissions when working from the axiom that emails on a mailing list should be the primary source of truth for everything needing review, rather than starting from more structured data and generating emails as one form of output. Moving to a pull request system is not expected to change policies regarding who can approve a change for inclusion, or the technical limits on who can cause a change to enter mainline (e.g. all people with commit access would be expected to be able to use a button in a web interface to cause to PR to be merged, though policy might limit when they should do so). We can of course choose to change policies, either as part of adopting a PR system or later. 2. Key features (a) Some forges have a design that emphasises the tree you get after a proposed contribution, but not the sequence of commits to get there. For the toolchain, we care about the clean, logical sequence of commits on the mainline branch. (We also use linear history on mainline, but that's a secondary matter - though certainly we'd want any forge used to support such linear history so that property doesn't need to change as part of adopting pull request workflow.) Having a clean sequence of commits has some implications for forge support: * Support for reviewing the proposed commit message, not just the diffs, is important (and it should be clear what commit message would result from merging any pull request). * Patch series and dependencies between patches are important. In such cases, fixes for issues from review should go into the appropriate logical commit in the PR (with rebasing as necessary), and it should be possible at all times to see what the sequence of commits on mainline would look like. (E.g. people use some workarounds on GitHub to manage PR dependencies, but those result in second and subsequent PRs in a series showing the full set of diffs from a PR and those it depends on, rather than just the logical diffs for one PR.) I consider patch series and dependencies to be separate but related things: a patch series may not have strictly linear dependencies (and it's often useful to merge the more straightforward patches from a series while still discussing and revising others), while a patch may depend on other patches that it's not logically part of the same series as. They are, however, closely related, and a sufficient solution for dependencies might also be adequate for many cases of series. Note that series can sometimes have hundreds of patches; any solution for patch series and dependencies needs to scale that far. There is of course the common case of a single-patch submission, where the patch is ready for inclusion after some number of fixes. In those cases, it's probably convenient if it's not necessary to rebase - provided it's clear that a particular PR would be squash-merged, and also what the commit message would be for the final fixed commit. * Given the need for rebasing when working with patch series, it's important to have good support for rebasing. In particular, all revisions of the changes for a PR that was rebased need to remain permanently available (e.g. through appropriate documented refs to fetch to get all revisions of all PRs). (b) Various people have built efficient workflows for going through all patch submissions and comments (or all in a particular area), even when only reviewing a small proportion, and have concerns about efficiency of a web interface when working with many patches and comments. It's important to have good API support to allow people to build tools supporting their own workflow like this without needing to use the browser interface (and following their own logic, not the forge's, for what changes are of interest). Good API support might, for example, include a straightforward way to get all changes to PR and comment data and metadata since some particular point, as well as for actions such as reviewing / commenting / approving a PR. Such API support might be similar to what's needed to ensure people can readily get and maintain a local replica of all the key data and its history for all PRs. Replication like that is also important for reliably ensuring key data remains available even if the forge software subsequently ceases to be maintained. Consider, for example, the apparent disappearance of the data from www-gnats.gnu.org (we occasionally come across references to old bug reports from that system in glibc development, but don't have any way to resolve those references). Another use of such an API would be to allow maintaining local copies of all PR comments etc. in a plain text form that can be searched with grep. (c) Given that a transition would be from using mailing lists, it's important to have good plain text outward email notifications for all new merge requests, comments thereon and other significant actions (changing metadata, approvals / merges, etc.) - through whatever combination of built-in support in a forge and local implementation using the API. Such notifications would go to the existing mailing lists (note that the choice of mailing lists for a patch can depend on which parts of the tree it changes - for example, the choice between binutils and gdb-patches lists, or some GCC patches going to the fortran or libstdc++ lists) - as well as to individuals concerned with / subscribed to a given PR. Some very routine changes, such as reports of clean CI results, might be omitted by default from notifications sent to the mailing lists. "good" includes quoting appropriate diff hunks being comments on. It's OK to have an HTML multipart as well, but the quality of the plain text part matters. Diffs themselves should be included in new-PR emails (and those for changes to a PR) unless very large. I do not however suggest trying to take incoming email (unstructured) and turn it into more structured comments on particular parts of a PR in the database. Similarly, commit emails should continue to go to the existing mailing lists for those. (d) Rather than the forge owning the mainline branch in the sense of every commit having to go through an approved PR, at least initially it should be possible for people to push directly to mainline as well, so the transition doesn't have to be instantaneous (and in particular, if a change was posted before the transition and approved after it, it shouldn't be necessary to turn it into a PR). Longer term, I think it would be a good idea to move to everything going through the PR system (with people able to self-approve and merge their own PRs immediately where they can commit without review at present), so there is better structured uniform tracking of all changes and associated CI information etc. - however, direct commits to branches other than mainline (and maybe release branches) should continue to be OK long term (although it should also be possible to make a PR to such a branch if desired). Beyond putting everything through PRs, eventually I'd hope to have merges to mainline normally all go through a CI system that makes sure there are no regressions for at least one configuration before merging changes. (e) All existing pre-commit checks from hooks should be kept in some form, to maintain existing invariants on both tree and commit contents (some hook checks make sure that commits don't have commit messages that would cause other automated processes to fall over later, for example). (f) Existing cron jobs that read from or commit to repositories should use normal remote git access, not filesystem access to the repository, including using APIs to create and self-merge PRs when pushing to the repository is involved. -- Joseph S. Myers josmy...@redhat.com