> On Dec 30, 2019, at 1:24 AM, Richard Earnshaw (lists) 
> <richard.earns...@arm.com> wrote:
> 
> On 29/12/2019 18:30, Maxim Kuvyrkov wrote:
>> Below are several more issues I found in reposurgeon-6a conversion comparing 
>> it against gcc-reparent conversion.
>> 
>> I am sure, these and whatever other problems I may find in the reposurgeon 
>> conversion can be fixed in time.  However, I don't see why should bother.  
>> My conversion has been available since summer 2019, I made it ready in time 
>> for GCC Cauldron 2019, and it didn't change in any significant way since 
>> then.
>> 
>> With the "Missed merges" problem (see below) I don't see how reposurgeon 
>> conversion can be considered "ready".  Also, I expected a diligent developer 
>> to compare new conversion (aka reposurgeon's) against existing conversion 
>> (aka gcc-pretty / gcc-reparent) before declaring the new conversion "better" 
>> or even "ready".  The data I'm seeing in differences between my and 
>> reposurgeon conversions shows that gcc-reparent conversion is /better/.
>> 
>> I suggest that GCC community adopts either gcc-pretty or gcc-reparent 
>> conversion.  I welcome Richard E. to modify his summary scripts to work with 
>> svn-git scripts, which should be straightforward, and I'm ready to help.
>> 
> 
> I don't think either of these conversions are any more ready to use than
> the reposurgeon one, possibly less so.  In fact, there are still some
> major issues to resolve first before they can be considered.
> 
> gcc-pretty has completely wrong parent information for the gcc-3 era
> release tags, showing the tags as being made directly from trunk with
> massive deltas representing the roll-up of all the commits that were
> made on the gcc-3 release branch.

I will clarify the above statement, and please correct me where you think I'm 
wrong.  Gcc-pretty conversion has the exact right parent information for the 
gcc-3 era
release tags as recorded in SVN version history.  Gcc-pretty conversion aims to 
produce an exact copy of SVN history in git.  IMO, it manages to do so just 
fine.

It is a different thing that SVN history has a screwed up record of gcc-3 era 
tags.

> 
> gcc-reparent is better, but many (most?) of the release tags are shown
> as merge commits with a fake parent back to the gcc-3 branch point,
> which is certainly not what happened when the tagging was done at that
> time.

I agree with you here.

> 
> Both of these factually misrepresent the history at the time of the
> release tag being made.

Yes and no.  Gcc-pretty repository mirrors SVN history.  And regarding the need 
for reparenting -- we lived with current history for gcc-3 release tags for a 
long time.  I would argue their continued brokenness is not a show-stopper.

Looking at this from a different perspective, when I posted the initial svn-git 
scripts back in Summer, the community roughly agreed on a plan to
1. Convert entire SVN history to git.
2. Use the stock git history rewrite tools (git filter-branch) to fixup what we 
want, e.g., reparent tags and branches or set better author/committer entries.

Gcc-pretty does (1) in entirety.

For reparenting, I tried a 15min fix to my scripts to enable reparenting, which 
worked, but with artifacts like the merge commit from old and new parents.  I 
will drop this and instead use tried-and-true "git filter-branch" to reparent 
those tags and branches, thus producing gcc-reparent from gcc-pretty.

> 
> As for converting my script to work with your tools, I'm afraid I don't
> have time to work on that right now.  I'm still bogged down validating
> the incorrect bug ids that the script has identified for some commits.
> I'm making good progress (we're down to 160 unreviewed commits now), but
> it is still going to take what time I have over the next week to
> complete that task.
> 
> Furthermore, there is no documentation on how your conversion scripts
> work, so it is not possible for me to test any work I might do in order
> to validate such changes.  Not being able to run the script locally to
> test change would be a non-starter.
> 
> You are welcome, of course, to clone the script I have and attempt to
> modify it yourself, it's reasonably well documented.  The sources can be
> found in esr's gcc-conversion repository here:
> https://gitlab.com/esr/gcc-conversion.git

--
Maxim Kuvyrkov
https://www.linaro.org

> 
> 
>> Meanwhile, I'm going to add additional root commits to my gcc-reparent 
>> conversion to bring in "missing" branches (the ones, which don't share 
>> history with trunk@1) and restart daily updates of gcc-reparent conversion.
>> 
>> Finally, with the comparison data I have, I consider statements about 
>> git-svn's poor quality to be very misleading.  Git-svn may have had serious 
>> bugs years ago when Eric R. evaluated it and started his work on 
>> reposurgeon.  But a lot of development has happened and many problems have 
>> been fixed since them.  At the moment it is reposurgeon that is producing 
>> conversions with obscure mistakes in repository metadata.
>> 
>> 
>> === Missed merges ===
>> 
>> Reposurgeon misses merges from trunk on 130+ branches.  I've spot-checked 
>> ARM/hard_vfp_branch and redhat/gcc-9-branch and, indeed, rather mundane 
>> merges were omitted.  Below is analysis for ARM/hard_vfp_branch.
>> 
>> $ git log --stat refs/remotes/gcc-reposurgeon-6a/ARM/hard_vfp_branch~4
>> ----
>> commit ef92c24b042965dfef982349cd5994a2e0ff5fde
>> Author: Richard Earnshaw <rearn...@gcc.gnu.org>
>> Date:   Mon Jul 20 08:15:51 2009 +0000
>> 
>>    Merge trunk through to r149768
>> 
>>    Legacy-ID: 149804
>> 
>> COPYING.RUNTIME                                     |    73 +
>> ChangeLog                                           |   270 +-
>> MAINTAINERS                                         |    19 +-
>> <MANY OTHER FILES>
>> ----
>> 
>> at the same time for svn-git scripts we have:
>> 
>> $ git log --stat refs/remotes/gcc-reparent/ARM/hard_vfp_branch~4
>> ----
>> commit ce7d5c8df673a7a561c29f095869f20567a7c598
>> Merge: 4970119c20da 3a69b1e566a7
>> Author: Richard Earnshaw <rearn...@arm.com>
>> Date:   Mon Jul 20 08:15:51 2009 +0000
>> 
>>    Merge trunk through to r149768
>> 
>>    git-svn-id: 
>> https://gcc.gnu.org/svn/gcc/branches/ARM/hard_vfp_branch@149804 
>> 138bc75d-0d04-0410-961f-82ee72b054a4
>> ----
>> 
>> ... which agrees with
>> $ svn propget svn:mergeinfo 
>> file:///home/maxim.kuvyrkov/tmpfs-stuff/svnrepo/branches/ARM/hard_vfp_branch@149804
>> /trunk:142588-149768
>> 
>> === Bad author entries ===
>> 
>> Reposurgeon-6a conversion has authors "12:46:56 1998 Jim Wilson" and 
>> "2005-03-18 Kazu Hirata".  It is rather obvious that person's name is 
>> unlikely to start with a digit.
>> 
>> === Missed authors ===
>> 
>> Reposurgeon-6a conversion misses many authors, below is a list of people 
>> with names starting with "A".
>> 
>> Akos Kiss
>> Anders Bertelrud
>> Andrew Pochinsky
>> Anton Hartl
>> Arthur Norman
>> Aymeric Vincent
>> 
>> === Conservative author entries ===
>> 
>> Reposurgeon-6a conversion uses default "@gcc.gnu.org" emails for many 
>> commits where svn-git conversion manages to extract valid email from commit 
>> data.  This happens for hundreds of author entries.
>> 
>> Regards,
>> 
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>> 
>> 
>>> On Dec 26, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
>>> wrote:
>>> 
>>> 
>>>> On Dec 26, 2019, at 2:16 PM, Jakub Jelinek <ja...@redhat.com> wrote:
>>>> 
>>>> On Thu, Dec 26, 2019 at 11:04:29AM +0000, Joseph Myers wrote:
>>>> Is there some easy way (e.g. file in the conversion scripts) to correct
>>>> spelling and other mistakes in the commit authors?
>>>> E.g. there are misspelled surnames, etc. (e.g. looking at my name, I see
>>>> Jakub Jakub Jelinek (1):
>>>> Jakub Jeilnek (1):
>>>> Jelinek (1):
>>>> entries next to the expected one with most of the commits.
>>>> For the misspellings, wonder if e.g. we couldn't compute edit distances 
>>>> from
>>>> other names and if we have one with many commits and then one with very few
>>>> with small edit distance from those, flag it for human review.
>>> 
>>> This is close to what svn-git-author.sh script is doing in gcc-pretty and 
>>> gcc-reparent conversions.  It ignores 1-3 character differences in 
>>> author/committer names and email addresses.  I've audited results for all 
>>> branches and didn't spot any mistakes.
>>> 
>>> In other news, I'm working on comparison of gcc-pretty, gcc-reparent and 
>>> gcc-reposurgeon-5a repos among themselves.  Below are current notes for 
>>> comparison of gcc-pretty/trunk and gcc-reposurgeon-5a/trunk.
>>> 
>>> == Merges on trunk ==
>>> 
>>> Reposurgeon creates merge entries on trunk when changes from a branch are 
>>> merged into trunk.  This brings entire development history from the branch 
>>> to trunk, which is both good and bad.  The good part is that we get more 
>>> visibility into how the code evolved.  The bad part is that we get many 
>>> "noisy" commits from merged branch (e.g., "Merge in trunk" every few 
>>> revisions) and that our SVN branches are work-in-progress quality, not 
>>> ready for review/commit quality.  It's common for files to be re-written in 
>>> large chunks on branches.
>>> 
>>> Also, reposurgeon's commit logs don't have information on SVN path from 
>>> which the change came, so there is no easy way to determine that a given 
>>> commit is from a merged branch, not an original trunk commit.  Git-svn, on 
>>> the other hand, provides "git-svn-id: <path>@<revision>" tags in its commit 
>>> logs.
>>> 
>>> My conversion follows current GCC development policy that trunk history 
>>> should be linear.  Branch merges to trunk are squashed.  Merges between 
>>> non-trunk branches are handled as specified by svn:mergeinfo SVN properties.
>>> 
>>> == Differences in trees ==
>>> 
>>> Git trees (aka filesystem content) match between pretty/trunk and 
>>> reposurgeon-5a/trunk from current tip and up tosvn's r130805.
>>> Here is SVN log of that revision (restoration of deleted trunk):
>>> ------------------------------------------------------------------------
>>> r130805 | dberlin | 2007-12-13 01:53:37 +0000 (Thu, 13 Dec 2007)
>>> Changed paths:
>>>  A /trunk (from /trunk:130802)
>>> ------------------------------------------------------------------------
>>> 
>>> Reposurgeon conversion has:
>>> -------------
>>> commit 7e6f2a96e89d96c2418482788f94155d87791f0a
>>> Author: Daniel Berlin <dber...@gcc.gnu.org>
>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>> 
>>>   Readd trunk
>>> 
>>>   Legacy-ID: 130805
>>> 
>>> .gitignore | 17 -----------------
>>> 1 file changed, 17 deletions(-)
>>> -------------
>>> and my conversion has:
>>> -------------
>>> commit fb128f3970789ce094c798945b4fa20eceb84cc7
>>> Author: Daniel Berlin <dber...@dbrelin.org>
>>> Date:   Thu Dec 13 01:53:37 2007 +0000
>>> 
>>>   Readd trunk
>>> 
>>> 
>>>   git-svn-id: https://gcc.gnu.org/svn/gcc/trunk@130805 
>>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>> -------------
>>> 
>>> It appears that .gitignore has been added in r1 by reposurgeon and then 
>>> deleted at r130805.  In SVN repository .gitignore was added in r195087.  I 
>>> speculate that addition of .gitignore at r1 is expected, but it's deletion 
>>> at r130805 is highly suspicious.
>>> 
>>> == Committer entries ==
>>> 
>>> Reposurgeon uses $u...@gcc.gnu.org for committer email addresses even when 
>>> it correctly detects author name from ChangeLog.
>>> 
>>> reposurgeon-5a:
>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mar...@gcc.gnu.org>
>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz 
>>> <joz...@gcc.gnu.org>
>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath 
>>> <frede...@gcc.gnu.org>
>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay <g...@gcc.gnu.org>
>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener 
>>> <rgue...@gcc.gnu.org>
>>> 
>>> pretty:
>>> r278995 Martin Liska <mli...@suse.cz> Martin Liska <mli...@suse.cz>
>>> r278994 Jozef Lawrynowicz <joze...@mittosystems.com> Jozef Lawrynowicz 
>>> <joze...@mittosystems.com>
>>> r278993 Frederik Harwath <frede...@codesourcery.com> Frederik Harwath 
>>> <frede...@codesourcery.com>
>>> r278992 Georg-Johann Lay <a...@gjlay.de> Georg-Johann Lay <a...@gjlay.de>
>>> r278991 Richard Biener <rguent...@suse.de> Richard Biener 
>>> <rguent...@suse.de>
>>> 
>>> == Bad summary line ==
>>> 
>>> While looking around r138087, below caught my eye.  Is the contents of 
>>> summary line as expected?
>>> 
>>> commit cc2726884d56995c514d8171cc4a03657851657e
>>> Author: Chris Fairles <chris.fair...@gmail.com>
>>> Date:   Wed Jul 23 14:49:00 2008 +0000
>>> 
>>>   acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define GLIBCXX_LIBS.
>>> 
>>>   2008-07-23  Chris Fairles <chris.fair...@gmail.com>
>>> 
>>>           * acinclude.m4 ([GLIBCXX_CHECK_CLOCK_GETTIME]): Define 
>>> GLIBCXX_LIBS.
>>>           Holds the lib that defines clock_gettime (-lrt or -lposix4).
>>>           * src/Makefile.am: Use it.
>>>           * configure: Regenerate.
>>>           * configure.in: Likewise.
>>>           * Makefile.in: Likewise.
>>>           * src/Makefile.in: Likewise.
>>>           * libsup++/Makefile.in: Likewise.
>>>           * po/Makefile.in: Likewise.
>>>           * doc/Makefile.in: Likewise.
>>> 
>>>   Legacy-ID: 138087
>>> 
>>> 
>>> --
>>> Maxim Kuvyrkov
>>> https://www.linaro.org
>>> 
>> 
> 

Reply via email to