Re: [ANNOUNCE] git-series: track changes to a patch series over time

Josh Triplett Thu, 11 Aug 2016 20:57:55 -0700

On Thu, Aug 11, 2016 at 01:16:04PM +0100, Ian Jackson wrote:
> Josh Triplett writes ("[ANNOUNCE] git-series: track changes to a patch series 
> over time"):
> > I'd like to announce a project I've been working on for a while:
> 
> Thanks for the info.  I have an interest in this kind of thing.


I thought you might; when I mentioned that git-series might work well
for Debian packaging, I had things like dgit in mind.  I'd love to talk
with you in more detail about how something like dgit could interoperate
with git-series.

> I don't mean to be discouraging, but I have some questions/concerns.

By all means.  Feedback welcome; I'd find silence more discouraging.

> AFAICT from the available information, your tool is roughly speaking a
> competitor to stgit.  It seems to be intended to offer a different
> (better) UI for patch stack management than raw git, and the ability
> to push a series around in a fast-forwarding way.
> 
> My biggest question therefore is: how does your tool compare to
> stgit ?  Why should we use your tool rather than stgit ?

While stgit does track the history of changes made to the stack, as far
as I can tell, it doesn't do so in a manner meant for interchange
between users.  stgit works locally for one user, but doesn't seem to
support multiple users.  And the history of the patch stack doesn't
include commit messages, nor does it group changes into logical commits.
It seems more like the reflog (a tool to rescue old bits) than a
historical record.

stgit also doesn't track and version a cover letter.

I wanted to define a well-documented interchange format for the "history
of history", so that other tools to manage and send patch series or pull
requests could use that format.  For instance, today if you want to
rework a pull request on GitHub or GitLab in a non-fast-forwarding way,
you either force-push to the branch you requested a pull from
(discarding the old history), or you open a new pull request (and close
the old one).  And even the APIs don't make it easy to get previous
versions of the pull request.  I'd love to instead push and pull a
fast-forwarding series branch corresponding to the history of the pull
request.

stgit also fundamentally changes the patch-manipulation workflow to use
stg rather than git.  For instance, if you want to reorder or edit
patches, you use stg to do so, not git rebase -i; if you want to import
a patch, you use stg import, not git am.  Using various git commands
directly will confuse stg and require running "stg repair" or similar
(see https://stgit.org/stg-repair.html).

I don't actually want to change people's current git workflows for
manipulating patches.  I think stg has a very interesting workflow,
similar to quilt, but I want to support existing workflows people
already use in git today.

While git-series provides a few helpers for manipulating patches when
they can take advantage of additional information git-series knows (such
as `git series rebase -i`, which takes advantage of already knowing the
base of the series to avoid needing to specify any further arguments),
you can *always* use any underlying git tool to manipulate commits, and
then track the result with git-series.

One other interesting quirk: stgit can only track a linear series of
patches, not merge commits.  git-series can actually handle a series
that includes merge commits.  While `git series format` and `git series
rebase` won't work with such a non-linear series, workflows based on
pull, push, and `git series req` will work just fine.  (I plan to make
rebase work for non-linear series, as soon as libgit2 has better support
for rebasing; right now I actually write out rebase state to .git and
call `git rebase --continue`.  In theory format could support non-linear
series, as `git format-patch` does, but `git format-patch` seems willing
to throw away or lose information from merge commits, especially
non-trivial merges; I don't want to do that.)

> My next question is: how do you handle merging of changes made in
> parallel in different meta-branches of the same series ?  I don't mean
> just aggregating patches, but other common operations such as:
> reordering of patches; editing patch commit messages (or the cover
> letter); splitting and merging patches; git rebase --autosquash; etc.
> 
> I didn't see anything in the docs about this.  And I confess I didn't
> run your code to do any experiments.

git-series does support merge commits within the series branch; see the
section "git-series commits" in INTERNALS.  Right now, git-series
doesn't create those merge commits for you, but I plan to add a
mechanism to support that.  That'll probably start out as "here's two
patch series, tell me when you've finished creating the merged version
and I'll commit it", though I could imagine handling many simple cases
more automatically.  I hope that building a simple tool and
incrementally improving it will work.

> I found the docs were unclear about the interaction between raw git
> operations and git series operations.  In particular, the interaction
> between `git checkout' (and other branch-switching operations).  I
> think that some combinations of these operations could result in
> ... unexpected and undesirable results.

I've tested and thought about many of those cases in detail, and
carefully avoided any case that could allow data loss.  If you see any
way to cause git or git-series to lose data or otherwise do the wrong
thing, please let me know.

At the file level, `git series checkout` will safely avoid overwriting
any local changes or untracked files.  (I don't actually even *have* a
`git series checkout -f`; you'd have to use `git checkout -f` or `git
reset --hard` first if you want to throw away changes.)

At the commit level, git-series always treats HEAD as the working
version of "series"; any git operation that changes "HEAD", including
switching branches, will cause git-series to treat "series" as changed
in the working version.  So, if you check out a branch and thus change
HEAD, `git series status` will show "series" as changed.

Also, `git series rebase` or `git series rebase -i` always detaches
HEAD, to avoid inadvertently changing a branch that HEAD points to.
git-series generally expects you to work with a detached HEAD, letting
the series track changes to HEAD.  If you re-attach HEAD, then git
commands like commit/rebase/revert/cherry-pick will operate on the
branch you attached to, but git-series commands will always re-detach
HEAD and avoid changing any underlying branch.

I also gave a great deal of thought to the case of having some changes
to a series and then switching to a different series.  For instance,
suppose you change the cover letter, rebase -i the series, and then want
to work on a different series.  git generally assumes that if you have
changes in your index or working tree when you run "git checkout
otherbranch", you want to move them along with you to otherbranch (and
complain if they conflict with otherbranch).  git-series, instead,
independently tracks the "staged" and "working" versions of every
series; if you switch to a different series (or start a new one), it'll
leave the "staged" and "working" versions of your current series
untouched, and when you switch back to that series, you'll see all those
same changes.

I'd absolutely welcome more review and suggestions here; if you see any
corner case I've missed, please let me know.

> I did read the INTERNALS document about the data structures.  I wonder
> why you rejected other possibilities.  In particular, your top level
> `git series' branch data structure is not directly useable by any
> other tool; it needs to be dereferenced/converted, to produce a
> useable commit.  Did you consider recording the metadata as dotfiles
> in tree objects, or some such ?

I started with a few fundamental constraints:
- The commits tracked by the series *must* remain directly usable as
  commits in the underlying project, whether by sending patches or by
  pushing/pulling.
- git must find every object in the history of a series reachable from a
  ref, so that fsck/repack/prune/etc cannot discard series history.
- Similarly, `git push` and `git fetch` must work on series commits, and
  must transmit/receive the full series history with a series branch,
  without requiring any additional commands or special "series" versions
  of push/fetch.

These constraints limit where metadata can live.  Adding any dotfiles to
the commits in the patch series would mean the resulting patches would
include those dotfiles.  Any metadata added to commit messages would end
up in patches; note that several projects, including the Linux kernel,
have complained about patches that include Gerrit "Change-Id" tags.  Any
format that stored patches within a series commit, rather than full
links to commits for the patches, would not leave the commits themselves
usable by git.

Based on all of that, I settled on using a tree object as a key-value
store.  The special use of "parent" commits of series commits preserves
reachability and the ability to transmit and receive a series using just
git.  (Note that git's reachability algorithms do not follow gitlink
commits within tree objects, partly because they expect that such
gitlinks may point to commits not present in the current repository's
object store.)

Do you see another possible storage format that meets all the
constraints above?

Also, you might find it amusing that many parts of git can handle series
commits, thanks to the many syntaxes supported by `git rev-parse`.  For
instance, if you've created a series named "feature", you can refer to
the current version of the series as git-series/feature:series.  That
works with git log or git diff, for instance.  (Sadly, it doesn't work
with "git push", because refspecs use ':' to separate the local and
remote branch names.)

Any other tool operating on git repositories that handles "extended
sha1" syntax as `git rev-parse` does can accept something like
"git-series/feature:series" in place of a commit; you can even pass
something like "git-series/feature~:series" if you want to operate on an
old version, or "git-series/feature:base..git-series/feature:series" if
you need to specify the range of commits in the patch series.

> One problem with the approach you have chosen (which, to be fair,
> affects stgit too) is that supplying some system (a repository; an
> auto builder; ...) with the appropriate tree in the form of a git
> commit, does not provide anyone who retrieves that commit with the
> series metadata - so all these other programs need to be provided with
> what is essentially an output git branch which is the result of an
> irreversible transformation from the git series information.

I can't think of any format meeting the above constraints (particularly
the first one) that would allow you to pass around the history of a
series of commits within one of those same commits you want the history
of.

However, as mentioned above, you can pass a tool such as an autobuilder
an extended sha1 such as "git-series/feature:series".

For repositories, you can push the series branch directly if you want to
provide the history of your series, or you can push the current version
(or an older version) of the patch series if you just want to publish
that version.

I hope to teach some repository browsers (such as cgit) how to read
git-series commits, which would make it easier to browse the patch
series.

- Josh Triplett

Re: [ANNOUNCE] git-series: track changes to a patch series over time

Reply via email to