On Thu, Aug 11, 2016 at 01:16:04PM +0100, Ian Jackson wrote: > Josh Triplett writes ("[ANNOUNCE] git-series: track changes to a patch series > over time"): > > I'd like to announce a project I've been working on for a while: > > Thanks for the info. I have an interest in this kind of thing.
I thought you might; when I mentioned that git-series might work well for Debian packaging, I had things like dgit in mind. I'd love to talk with you in more detail about how something like dgit could interoperate with git-series. > I don't mean to be discouraging, but I have some questions/concerns. By all means. Feedback welcome; I'd find silence more discouraging. > AFAICT from the available information, your tool is roughly speaking a > competitor to stgit. It seems to be intended to offer a different > (better) UI for patch stack management than raw git, and the ability > to push a series around in a fast-forwarding way. > > My biggest question therefore is: how does your tool compare to > stgit ? Why should we use your tool rather than stgit ? While stgit does track the history of changes made to the stack, as far as I can tell, it doesn't do so in a manner meant for interchange between users. stgit works locally for one user, but doesn't seem to support multiple users. And the history of the patch stack doesn't include commit messages, nor does it group changes into logical commits. It seems more like the reflog (a tool to rescue old bits) than a historical record. stgit also doesn't track and version a cover letter. I wanted to define a well-documented interchange format for the "history of history", so that other tools to manage and send patch series or pull requests could use that format. For instance, today if you want to rework a pull request on GitHub or GitLab in a non-fast-forwarding way, you either force-push to the branch you requested a pull from (discarding the old history), or you open a new pull request (and close the old one). And even the APIs don't make it easy to get previous versions of the pull request. I'd love to instead push and pull a fast-forwarding series branch corresponding to the history of the pull request. stgit also fundamentally changes the patch-manipulation workflow to use stg rather than git. For instance, if you want to reorder or edit patches, you use stg to do so, not git rebase -i; if you want to import a patch, you use stg import, not git am. Using various git commands directly will confuse stg and require running "stg repair" or similar (see https://stgit.org/stg-repair.html). I don't actually want to change people's current git workflows for manipulating patches. I think stg has a very interesting workflow, similar to quilt, but I want to support existing workflows people already use in git today. While git-series provides a few helpers for manipulating patches when they can take advantage of additional information git-series knows (such as `git series rebase -i`, which takes advantage of already knowing the base of the series to avoid needing to specify any further arguments), you can *always* use any underlying git tool to manipulate commits, and then track the result with git-series. One other interesting quirk: stgit can only track a linear series of patches, not merge commits. git-series can actually handle a series that includes merge commits. While `git series format` and `git series rebase` won't work with such a non-linear series, workflows based on pull, push, and `git series req` will work just fine. (I plan to make rebase work for non-linear series, as soon as libgit2 has better support for rebasing; right now I actually write out rebase state to .git and call `git rebase --continue`. In theory format could support non-linear series, as `git format-patch` does, but `git format-patch` seems willing to throw away or lose information from merge commits, especially non-trivial merges; I don't want to do that.) > My next question is: how do you handle merging of changes made in > parallel in different meta-branches of the same series ? I don't mean > just aggregating patches, but other common operations such as: > reordering of patches; editing patch commit messages (or the cover > letter); splitting and merging patches; git rebase --autosquash; etc. > > I didn't see anything in the docs about this. And I confess I didn't > run your code to do any experiments. git-series does support merge commits within the series branch; see the section "git-series commits" in INTERNALS. Right now, git-series doesn't create those merge commits for you, but I plan to add a mechanism to support that. That'll probably start out as "here's two patch series, tell me when you've finished creating the merged version and I'll commit it", though I could imagine handling many simple cases more automatically. I hope that building a simple tool and incrementally improving it will work. > I found the docs were unclear about the interaction between raw git > operations and git series operations. In particular, the interaction > between `git checkout' (and other branch-switching operations). I > think that some combinations of these operations could result in > ... unexpected and undesirable results. I've tested and thought about many of those cases in detail, and carefully avoided any case that could allow data loss. If you see any way to cause git or git-series to lose data or otherwise do the wrong thing, please let me know. At the file level, `git series checkout` will safely avoid overwriting any local changes or untracked files. (I don't actually even *have* a `git series checkout -f`; you'd have to use `git checkout -f` or `git reset --hard` first if you want to throw away changes.) At the commit level, git-series always treats HEAD as the working version of "series"; any git operation that changes "HEAD", including switching branches, will cause git-series to treat "series" as changed in the working version. So, if you check out a branch and thus change HEAD, `git series status` will show "series" as changed. Also, `git series rebase` or `git series rebase -i` always detaches HEAD, to avoid inadvertently changing a branch that HEAD points to. git-series generally expects you to work with a detached HEAD, letting the series track changes to HEAD. If you re-attach HEAD, then git commands like commit/rebase/revert/cherry-pick will operate on the branch you attached to, but git-series commands will always re-detach HEAD and avoid changing any underlying branch. I also gave a great deal of thought to the case of having some changes to a series and then switching to a different series. For instance, suppose you change the cover letter, rebase -i the series, and then want to work on a different series. git generally assumes that if you have changes in your index or working tree when you run "git checkout otherbranch", you want to move them along with you to otherbranch (and complain if they conflict with otherbranch). git-series, instead, independently tracks the "staged" and "working" versions of every series; if you switch to a different series (or start a new one), it'll leave the "staged" and "working" versions of your current series untouched, and when you switch back to that series, you'll see all those same changes. I'd absolutely welcome more review and suggestions here; if you see any corner case I've missed, please let me know. > I did read the INTERNALS document about the data structures. I wonder > why you rejected other possibilities. In particular, your top level > `git series' branch data structure is not directly useable by any > other tool; it needs to be dereferenced/converted, to produce a > useable commit. Did you consider recording the metadata as dotfiles > in tree objects, or some such ? I started with a few fundamental constraints: - The commits tracked by the series *must* remain directly usable as commits in the underlying project, whether by sending patches or by pushing/pulling. - git must find every object in the history of a series reachable from a ref, so that fsck/repack/prune/etc cannot discard series history. - Similarly, `git push` and `git fetch` must work on series commits, and must transmit/receive the full series history with a series branch, without requiring any additional commands or special "series" versions of push/fetch. These constraints limit where metadata can live. Adding any dotfiles to the commits in the patch series would mean the resulting patches would include those dotfiles. Any metadata added to commit messages would end up in patches; note that several projects, including the Linux kernel, have complained about patches that include Gerrit "Change-Id" tags. Any format that stored patches within a series commit, rather than full links to commits for the patches, would not leave the commits themselves usable by git. Based on all of that, I settled on using a tree object as a key-value store. The special use of "parent" commits of series commits preserves reachability and the ability to transmit and receive a series using just git. (Note that git's reachability algorithms do not follow gitlink commits within tree objects, partly because they expect that such gitlinks may point to commits not present in the current repository's object store.) Do you see another possible storage format that meets all the constraints above? Also, you might find it amusing that many parts of git can handle series commits, thanks to the many syntaxes supported by `git rev-parse`. For instance, if you've created a series named "feature", you can refer to the current version of the series as git-series/feature:series. That works with git log or git diff, for instance. (Sadly, it doesn't work with "git push", because refspecs use ':' to separate the local and remote branch names.) Any other tool operating on git repositories that handles "extended sha1" syntax as `git rev-parse` does can accept something like "git-series/feature:series" in place of a commit; you can even pass something like "git-series/feature~:series" if you want to operate on an old version, or "git-series/feature:base..git-series/feature:series" if you need to specify the range of commits in the patch series. > One problem with the approach you have chosen (which, to be fair, > affects stgit too) is that supplying some system (a repository; an > auto builder; ...) with the appropriate tree in the form of a git > commit, does not provide anyone who retrieves that commit with the > series metadata - so all these other programs need to be provided with > what is essentially an output git branch which is the result of an > irreversible transformation from the git series information. I can't think of any format meeting the above constraints (particularly the first one) that would allow you to pass around the history of a series of commits within one of those same commits you want the history of. However, as mentioned above, you can pass a tool such as an autobuilder an extended sha1 such as "git-series/feature:series". For repositories, you can push the series branch directly if you want to provide the history of your series, or you can push the current version (or an older version) of the patch series if you just want to publish that version. I hope to teach some repository browsers (such as cgit) how to read git-series commits, which would make it easier to browse the patch series. - Josh Triplett