> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) > <richard.earns...@arm.com> wrote: > > On 29/12/2019 18:30, Maxim Kuvyrkov wrote: >> Below are several more issues I found in reposurgeon-6a conversion comparing >> it against gcc-reparent conversion. >> >> I am sure, these and whatever other problems I may find in the reposurgeon >> conversion can be fixed in time. However, I don't see why should bother. >> My conversion has been available since summer 2019, I made it ready in time >> for GCC Cauldron 2019, and it didn't change in any significant way since >> then. >> >> With the "Missed merges" problem (see below) I don't see how reposurgeon >> conversion can be considered "ready". Also, I expected a diligent developer >> to compare new conversion (aka reposurgeon's) against existing conversion >> (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" >> or even "ready". The data I'm seeing in differences between my and >> reposurgeon conversions shows that gcc-reparent conversion is /better/. >> >> I suggest that GCC community adopts either gcc-pretty or gcc-reparent >> conversion. I welcome Richard E. to modify his summary scripts to work with >> svn-git scripts, which should be straightforward, and I'm ready to help. >> > > I don't think either of these conversions are any more ready to use than > the reposurgeon one, possibly less so. In fact, there are still some > major issues to resolve first before they can be considered. > > gcc-pretty has completely wrong parent information for the gcc-3 era > release tags, showing the tags as being made directly from trunk with > massive deltas representing the roll-up of all the commits that were > made on the gcc-3 release branch.
I will clarify the above statement, and please correct me where you think I'm wrong. Gcc-pretty conversion has the exact right parent information for the gcc-3 era release tags as recorded in SVN version history. Gcc-pretty conversion aims to produce an exact copy of SVN history in git. IMO, it manages to do so just fine. It is a different thing that SVN history has a screwed up record of gcc-3 era tags. > > gcc-reparent is better, but many (most?) of the release tags are shown > as merge commits with a fake parent back to the gcc-3 branch point, > which is certainly not what happened when the tagging was done at that > time. I agree with you here. > > Both of these factually misrepresent the history at the time of the > release tag being made. Yes and no. Gcc-pretty repository mirrors SVN history. And regarding the need for reparenting -- we lived with current history for gcc-3 release tags for a long time. I would argue their continued brokenness is not a show-stopper. Looking at this from a different perspective, when I posted the initial svn-git scripts back in Summer, the community roughly agreed on a plan to 1. Convert entire SVN history to git. 2. Use the stock git history rewrite tools (git filter-branch) to fixup what we want, e.g., reparent tags and branches or set better author/committer entries. Gcc-pretty does (1) in entirety. For reparenting, I tried a 15min fix to my scripts to enable reparenting, which worked, but with artifacts like the merge commit from old and new parents. I will drop this and instead use tried-and-true "git filter-branch" to reparent those tags and branches, thus producing gcc-reparent from gcc-pretty. > > As for converting my script to work with your tools, I'm afraid I don't > have time to work on that right now. I'm still bogged down validating > the incorrect bug ids that the script has identified for some commits. > I'm making good progress (we're down to 160 unreviewed commits now), but > it is still going to take what time I have over the next week to > complete that task. > > Furthermore, there is no documentation on how your conversion scripts > work, so it is not possible for me to test any work I might do in order > to validate such changes. Not being able to run the script locally to > test change would be a non-starter. > > You are welcome, of course, to clone the script I have and attempt to > modify it yourself, it's reasonably well documented. The sources can be > found in esr's gcc-conversion repository here: > https://gitlab.com/esr/gcc-conversion.git -- Maxim Kuvyrkov https://www.linaro.org > > >> Meanwhile, I'm going to add additional root commits to my gcc-reparent >> conversion to bring in "missing" branches (the ones, which don't share >> history with trunk@1) and restart daily updates of gcc-reparent conversion. >> >> Finally, with the comparison data I have, I consider statements about >> git-svn's poor quality to be very misleading. Git-svn may have had serious >> bugs years ago when Eric R. evaluated it and started his work on >> reposurgeon. But a lot of development has happened and many problems have >> been fixed since them. At the moment it is reposurgeon that is producing >> conversions with obscure mistakes in repository metadata. >> >> >> === Missed merges === >> >> Reposurgeon misses merges from trunk on 130+ branches. I've spot-checked >> ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane >> merges were omitted. Below is analysis for ARM/hard_vfp_branch. >> >> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4 >> ---- >> commit ef92c24b042965dfef982349cd5994a2e0ff5fde >> Author: Richard Earnshaw <rearn...@gcc.gnu.org> >> Date: Mon Jul 20 08:15:51 2009 +0000 >> >> Merge trunk through to r149768 >> >> Legacy-ID: 149804 >> >> COPYING.RUNTIME | 73 + >> ChangeLog | 270 +- >> MAINTAINERS | 19 +- >> <MANY OTHER FILES> >> ---- >> >> at the same time for svn-git scripts we have: >> >> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4 >> ---- >> commit ce7d5c8df673a7a561c29f095869f20567a7c598 >> Merge: 4970119c20da 3a69b1e566a7 >> Author: Richard Earnshaw <rearn...@arm.com> >> Date: Mon Jul 20 08:15:51 2009 +0000 >> >> Merge trunk through to r149768 >> >> git-svn-id: >> https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 >> 138bc75d-0d04-0410-961f-82ee72b054a4 >> ---- >> >> ... which agrees with >> $ svn propget svn:mergeinfo >> file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804 >> /trunk:142588-149768 >> >> === Bad author entries === >> >> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and >> "2005-03-18 Kazu Hirata". It is rather obvious that person's name is >> unlikely to start with a digit. >> >> === Missed authors === >> >> Reposurgeon-6a conversion misses many authors, below is a list of people >> with names starting with "A". >> >> Akos Kiss >> Anders Bertelrud >> Andrew Pochinsky >> Anton Hartl >> Arthur Norman >> Aymeric Vincent >> >> === Conservative author entries === >> >> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many >> commits where svn-git conversion manages to extract valid email from commit >> data. This happens for hundreds of author entries. >> >> Regards, >> >> -- >> Maxim Kuvyrkov >> https://www.linaro.org >> >> >>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> >>> wrote: >>> >>> >>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <ja...@redhat.com> wrote: >>>> >>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote: >>>> Is there some easy way (e.g. file in the conversion scripts) to correct >>>> spelling and other mistakes in the commit authors? >>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see >>>> Jakub Jakub Jelinek (1): >>>> Jakub Jeilnek (1): >>>> Jelinek (1): >>>> entries next to the expected one with most of the commits. >>>> For the misspellings, wonder if e.g. we couldn't compute edit distances >>>> from >>>> other names and if we have one with many commits and then one with very few >>>> with small edit distance from those, flag it for human review. >>> >>> This is close to what svn-git-author.sh script is doing in gcc-pretty and >>> gcc-reparent conversions. It ignores 1-3 character differences in >>> author/committer names and email addresses. I've audited results for all >>> branches and didn't spot any mistakes. >>> >>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and >>> gcc-reposurgeon-5a repos among themselves. Below are current notes for >>> comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk. >>> >>> == Merges on trunk == >>> >>> Reposurgeon creates merge entries on trunk when changes from a branch are >>> merged into trunk. This brings entire development history from the branch >>> to trunk, which is both good and bad. The good part is that we get more >>> visibility into how the code evolved. The bad part is that we get many >>> "noisy" commits from merged branch (e.g., "Merge in trunk" every few >>> revisions) and that our SVN branches are work-in-progress quality, not >>> ready for review/commit quality. It's common for files to be re-written in >>> large chunks on branches. >>> >>> Also, reposurgeon's commit logs don't have information on SVN path from >>> which the change came, so there is no easy way to determine that a given >>> commit is from a merged branch, not an original trunk commit. Git-svn, on >>> the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit >>> logs. >>> >>> My conversion follows current GCC development policy that trunk history >>> should be linear. Branch merges to trunk are squashed. Merges between >>> non-trunk branches are handled as specified by svn:mergeinfo SVN properties. >>> >>> == Differences in trees == >>> >>> Git trees (aka filesystem content) match between pretty/trunk and >>> reposurgeon-5a/trunk from current tip and up tosvn's r130805. >>> Here is SVN log of that revision (restoration of deleted trunk): >>> ------------------------------------------------------------------------ >>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007) >>> Changed paths: >>> A /trunk (from /trunk:130802) >>> ------------------------------------------------------------------------ >>> >>> Reposurgeon conversion has: >>> ------------- >>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a >>> Author: Daniel Berlin <dber...@gcc.gnu.org> >>> Date: Thu Dec 13 01:53:37 2007 +0000 >>> >>> Readd trunk >>> >>> Legacy-ID: 130805 >>> >>> .gitignore | 17 ----------------- >>> 1 file changed, 17 deletions(-) >>> ------------- >>> and my conversion has: >>> ------------- >>> commit fb128f3970789ce094c798945b4fa20eceb84cc7 >>> Author: Daniel Berlin <dber...@dbrelin.org> >>> Date: Thu Dec 13 01:53:37 2007 +0000 >>> >>> Readd trunk >>> >>> >>> git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 >>> 138bc75d-0d04-0410-961f-82ee72b054a4 >>> ------------- >>> >>> It appears that .gitignore has been added in r1 by reposurgeon and then >>> deleted at r130805. In SVN repository .gitignore was added in r195087. I >>> speculate that addition of .gitignore at r1 is expected, but it's deletion >>> at r130805 is highly suspicious. >>> >>> == Committer entries == >>> >>> Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even when >>> it correctly detects author name from ChangeLog. >>> >>> reposurgeon-5a: >>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mar...@gcc.gnu.org> >>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz >>> <joz...@gcc.gnu.org> >>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath >>> <frede...@gcc.gnu.org> >>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay <g...@gcc.gnu.org> >>> r278991 Richard Biener <rguent...@suse.de> Richard Biener >>> <rgue...@gcc.gnu.org> >>> >>> pretty: >>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mli...@suse.cz> >>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz >>> <joze...@mittosystems.com> >>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath >>> <frede...@codesourcery.com> >>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay <a...@gjlay.de> >>> r278991 Richard Biener <rguent...@suse.de> Richard Biener >>> <rguent...@suse.de> >>> >>> == Bad summary line == >>> >>> While looking around r138087, below caught my eye. Is the contents of >>> summary line as expected? >>> >>> commit cc2726884d56995c514d8171cc4a03657851657e >>> Author: Chris Fairles <chris.fair...@gmail.com> >>> Date: Wed Jul 23 14:49:00 2008 +0000 >>> >>> acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS. >>> >>> 2008-07-23 Chris Fairles <chris.fair...@gmail.com> >>> >>> * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define >>> GLIBCXX_LIBS. >>> Holds the lib that defines clock_gettime (-lrt or -lposix4). >>> * src/Makefile.am: Use it. >>> * configure: Regenerate. >>> * configure.in: Likewise. >>> * Makefile.in: Likewise. >>> * src/Makefile.in: Likewise. >>> * libsup++/Makefile.in: Likewise. >>> * po/Makefile.in: Likewise. >>> * doc/Makefile.in: Likewise. >>> >>> Legacy-ID: 138087 >>> >>> >>> -- >>> Maxim Kuvyrkov >>> https://www.linaro.org >>> >> >