As mentioned at the Cauldron, I'm looking at finding better branchpoints
for the cases in the GCC repository where cvs2svn messed up identifying
the parent branch and commit on which a branch was based, so that affected
branches can be reparented as part of moving to git, since messed-up
branchpoints are actually confusing in practice when looking at old
branches.
An idiomatic branch in SVN would start with a commit that just copies one
commit of one branch to another branch, with no further changes. In many
cases it's not possible to achieve that through reparenting because there
is no commit on any parent branch exactly corresponding to the first
commit on the cvs2svn-generated branch. However, it's still possible to
find a much better approximation than cvs2svn did in some cases. (There
are also cases where cvs2svn found a good branchpoint, but represented the
branch-creation commit in a superfluously complicated way, replacing lots
of files and subdirectories by copies of different revisions. That
doesn't really matter for conversion to git, however, since git's data
structures don't say anything about where a particular subdirectory was
copied from, just the tree hash and the parent commit.)
I'm using heuristics to see if a particular branch has a suspicious
branchpoint. First, if there is a branchpoint tag I take that as the best
estimate of what the tree should look like at the branchpoint commit on
the parent branch; otherwise, I take the first commit on the branch as the
best estimate of that. Then, I consider a branchpoint not to be
suspicious if the only diffs between the tree at the parent commit and the
tree estimated to start the branch to be file deletions, and, if there was
no branchpoint commit, file additions.
(There are several reasons why the creation of a branch might involve file
deletions. Some look like CVS glitches where it simply failed to create
the branch in particular ,v files; some may be cases where the person
created the branch only for certain subdirectories, deliberately; some
look like cases where ,v files for separately developed subdirectories,
e.g. libjava, got moved into the GCC CVS repository at some point, so
resulting in the appearance of those subdirectories being deleted on
creation of branches before they were moved into place. File additions at
branch creation look more like an artifact of how cvs2svn handles cases of
a file first added on trunk after a branch was created, then backported to
that branch.)
If the branchpoint is suspicious (54 are, out of 135 branches in /branches
as of r105925, the last cvs2svn-generated commit), I then look for an
alternative non-suspicious branchpoint, which might be either on the same
parent branch currently used, or on a different one chosen by some
heuristics. Because pretty much all normal GCC commits change file
contents (modifying a ChangeLog file, if nothing else), any candidate
parent that is non-suspicious, and thus does not involve any file content
differences when compared with the branchpoint commit or first commit on
the branch, should be very close to being the right parent commit.
Here is a list of reparentings I suggest for 16 of those 54 branches,
including in particular the cases of egcs_1_00_branch and gcc-3_2-branch
that were noted on IRC to have bad branchpoints at present; some are only
small changes, some are much more major fixes. I expect I can find
reparentings for some of the rest with more investigation and improved
heuristics or hints for those heuristics, while others may well already be
essentially the right branchpoint despite file content changes being
present in the first commit. (Two of the rest do have reparentings
suggested by my script, but they need more careful investigation because
of file content mismatches between the branchpoint tags and the first
commit on the branch.)
The first two columns after REPARENT: list the SVN path of the branch, and
the revision number of the first commit on it (the one that should be
reparented). The next two list the suspicious parent (that is, the branch
and revision from which cvs2svn generated the copy that created the
top-level /branches/whatever directory for the branch, along with further
changes in the commit to fix up files and subdirectories in that copy to
have the right tree contents). The final two columns list the proposed
new parent branch and revision on that branch. In all cases, the tree
content is expected to be left as generated by cvs2svn; it's simply the
commit parent that should be changed in git.
REPARENT: /branches/GC_5_0_ALPHA_1 27860 /trunk 27852 /trunk 27855
REPARENT: /branches/csl-3_3_1-branch 70143 /trunk 60111
/branches/gcc-3_3-branch 70142
REPARENT: /branches/csl-3_4-linux-branch 90110 /trunk 75991
/branches/gcc-3_4-branch 90109
REPARENT: /branches/csl-3_4_0-hp-branch 80843 /trunk 75991
/branches/gcc-3_4-branch 80842
REPARENT: /branches/csl-