Last time I did a comparison between SVN head and the git conversion tip they matched exactly. This time I have mismatches in the following files.
libtool.m4 libvtv/ChangeLog libvtv/configure libvtv/testsuite/lib/libvtv.exp ltmain.sh lto-plugin/ChangeLog lto-plugin/configure lto-plugin/lto-plugin.c MAINTAINERS maintainer-scripts/ChangeLog maintainer-scripts/crontab maintainer-scripts/gcc_release Makefile.def Makefile.in Makefile.tpl zlib/configure zlib/configure.ac Now I'll explain what this means and why it's a serious problem. Reposurgeon is never confused by linear history, branching, or tagging; I have lots of regression tests for those cases. When it screws up it is invariably around branch copy operations, because there are cases near those where the data model of Subversion stream files is underspecified. That model was in fact entirely undocumented before I reverse-engineered it and wrote the description that now lives in the Subversion source tree. But that description is not complete; nobody, not even Subversion's designers, knows how to fill in all the corner cases. Thus, a content mismatch like this means there was some recent branch merge to trunk in the gcc history that reposurgeon is not interpreting as intended, or more likely an operator error such as a non-Subversion directory copy followed by a commit - my analyzer can recover from most such cases but not all. There are brute-force ways to pin down such malformations, but none of them are practical at the huge scale of this repository. The main problem here wouldn't reposurgeon itself but the fact that Subversion checkouts on a repo this large are very slow. I've seen a single one take 12 hours; an attempt at a whole bisection run to pin down the divergence point on trunk would therefore probably cost log2 of the commit length times that, or about 18 days. So...does that list of changed files look familar to anyone? If we can identify the revision number of the bad commit, the odds of being able to unscramble this mess go way up. They still aren't good, not when merely loading the repository for examination takes over four hours, but they would way better than if I were starting from zero. This is serious. I have preduced demonstrably correct history conversions of the gcc repo in the past. We may now be in a situation where I will never again be able to do that. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a> The real point of audits is to instill fear, not to extract revenue; the IRS aims at winning through intimidation and (thereby) getting maximum voluntary compliance -- Paul Strassel, former IRS Headquarters Agent Wall St. Journal 1980