On July 9, 2018 9:19:11 PM GMT+02:00, e...@thyrsus.com wrote: >Last time I did a comparison between SVN head and the git conversion >tip they matched exactly. This time I have mismatches in the following >files. > >libtool.m4 >libvtv/ChangeLog >libvtv/configure >libvtv/testsuite/lib/libvtv.exp >ltmain.sh >lto-plugin/ChangeLog >lto-plugin/configure >lto-plugin/lto-plugin.c >MAINTAINERS >maintainer-scripts/ChangeLog >maintainer-scripts/crontab >maintainer-scripts/gcc_release >Makefile.def >Makefile.in >Makefile.tpl >zlib/configure >zlib/configure.ac > >Now I'll explain what this means and why it's a serious problem. > >Reposurgeon is never confused by linear history, branching, or >tagging; I have lots of regression tests for those cases. When it >screws up it is invariably around branch copy operations, because >there are cases near those where the data model of Subversion stream >files is underspecified. That model was in fact entirely undocumented >before I reverse-engineered it and wrote the description that now >lives in the Subversion source tree. But that description is not >complete; nobody, not even Subversion's designers, knows how to fill >in all the corner cases. > >Thus, a content mismatch like this means there was some recent branch >merge to trunk in the gcc history that reposurgeon is not interpreting >as intended, or more likely an operator error such as a non-Subversion >directory copy followed by a commit - my analyzer can recover from >most such cases but not all. > >There are brute-force ways to pin down such malformations, but none of >them are practical at the huge scale of this repository. The main >problem here wouldn't reposurgeon itself but the fact that Subversion >checkouts on a repo this large are very slow. I've seen a single one >take 12 hours; an attempt at a whole bisection run to pin down the >divergence point on trunk would therefore probably cost log2 of the >commit length times that, or about 18 days.
12 hours from remote I guess? The subversion repository is available through rsync so you can create a local mirror to work from (we've been doing that at suse for years) Richard. > >So...does that list of changed files look familar to anyone? If we can >identify the revision number of the bad commit, the odds of being able >to unscramble this mess go way up. They still aren't good, not when >merely loading the repository for examination takes over four hours, >but they would way better than if I were starting from zero. > >This is serious. I have preduced demonstrably correct history >conversions of the gcc repo in the past. We may now be in a situation >where I will never again be able to do that.