Re: Commit messages and the move to git

2019-12-02 Thread Richard Earnshaw (lists)

On 19/11/2019 14:56, Jason Merrill wrote:

On Mon, Nov 18, 2019 at 4:38 PM Richard Earnshaw (lists) <
richard.earns...@arm.com> wrote:


On 18/11/2019 20:53, Jason Merrill wrote:

On Mon, Nov 18, 2019 at 2:51 PM Segher Boessenkool <
seg...@kernel.crashing.org> wrote:


On Mon, Nov 18, 2019 at 07:21:22PM +, Richard Earnshaw (lists)

wrote:

On 18/11/2019 18:53, Segher Boessenkool wrote:

PR target/92140: clang vs gcc optimizing with adc/sbb
PR fortran/91926: assumed rank optional
PR tree-optimization/91532: [SVE] Redundant predicated store in

gcc.target/aarch64/fmla_2.c

PR tree-optimization/92161: ICE in vect_get_vec_def_for_stmt_copy, at

tree-vect-stmts.c:1687

PR tree-optimization/92162: ICE in vect_create_epilog_for_reduction,

at tree-vect-loop.c:4252

PR c++/92015: internal compiler error: in cxx_eval_array_reference, at

cp/constexpr.c:2568

PR tree-optimization/92173: ICE in optab_for_tree_code, at

optabs-tree.c:81

PR tree-optimization/92173: ICE in optab_for_tree_code, at

optabs-tree.c:81

PR fortran/92174: runtime error: index 15 out of bounds for type

'gfc_expr *[15]


Most of these aren't helpful at all, and none of these are good commit
summaries.  The PR92173 one actually has identical commit messages

btw,

huh.  Ah, the second one (r277288) has the wrong changelog, but in the
actual changelog file as well, not something any tool could fix up (or
have we reached the singularity?)


Identical commits are normally from where the same commit is made to
multiple branches.  It's not uncommon to see this when bugs are fixed.


This is an actual mistake.  The commits are not identical at all, just
the commit messages are (and the changelog entries, too).  Not something
that happens to ften, but of course I hit it in the first random thing I
pick :-)


Ultimately the question here is whether something like the above is

more

or less useful than what we have today, which is summary lines of the

form:


  


I already said I would prefer things like
   Patch related to PR323
as the patch subject lines.  No one argues that the current state of
affairs is good.  I argue that replacing this with often wrong and
irrelevant information isn't the best we can do.



How about using the first line that isn't a ChangeLog date/author line,
without trying to rewrite/augment it?

Jason



It would certainly be another way of doing it.  Sometimes it would
produce almost the same as an unadulterated PR; sometimes it would
produce something more meaningful and sometimes it would be pretty
useless.  It probably would hit more cases than my current script in
that it wouldn't require the commit to mention a PR in it.

The main problem is that the first line is often incomplete, and much of
it is also wasted with elements like the full path to a file that is
quite deep in the tree.  Lets take a quick example (the first I found in
the dump I have).

1998-12-17  Vladimir N. Makarov  
 * config/i60/i960.md (extendqihi2): Fix typo (usage ',' instead of
 ';').
1998-12-17  Michael Tiemann  
 * i960.md (extend*, zero_extend*): Don't generate rtl that looks
 like (subreg:SI (reg:SI N) 0), because it's wrong, and it hides
 optimizations from the combiner.

Firstly, this example misses a blank line between the author and the
change message itself, which makes distinguishing between this and the
multiple authors case harder.  Secondly, the entry really spans two
lines and cutting it off at the end of the first line would be, well a
bit odd.  We could try to piece things together more, by combining lines
until we find a sentence end ( \.$ or \.\s\s ), and we could also strip
off more of the path components to just leave the name of the file
committed.  For example,

i960.md (extendqihi2): Fix typo (usage ',' instead of ';').

That might work better, but obviously it's harder to handle and thus
more likely to create erroneous summaries.



Yep. I don't think we need to worry about getting optimal one-line
summaries for ancient commits; something reasonably unique should be plenty.



Attached is the latest version of my script.  I used (very nearly) this 
to produce a conversion over the weekend and I've uploaded that here:


https://gitlab.com/rearnsha/gcc-trial-20191130

Note, that I might blow this away at any time.  IT IS NOT A FINAL 
CONVERSION.


Some other things to note:
- there are a number of known issues with the version of reposurgeon 
used for this that are being worked on

  - emptycommit-* tags - my control script was out-of-date
  - *deleted* branches - this is being worked on
  - weird dependencies around merges - this is being worked on
  - author attributions are sometimes incorrect - reported

The main difference between the attached script and the one I used for 
this conversion is that ChangeLog change that contain :: inside a 
function list is now handled correctly, resulting in a number of cases 
that were previously being missed now being correctly handled.

Should libstdc++ (and libgcc_s) be NODELETE?

2019-12-02 Thread Florian Weimer
libstdc++ recently made an appearance in a glibc bug, which made me
realize that they aren't marked as NODELETE (on GNU ELF platforms).
(The bug in question was purely glibc, though.)

Are these shared objects really expected to be unloaded?  Given that
libstdc++ contains many destructors, I expect that unloading will be
problematic in most cases.  Unloading libgcc_s also seems a bit over the
top to me.

This only matters if any of the shared objects is loaded late via
dlopen.  If the main program is linked against them, they are implicitly
NODELETE anyway.

Thoughts?  Should these shared objects be linked with -z nodelete?

Thanks,
Florian



Re: Commit messages and the move to git

2019-12-02 Thread Segher Boessenkool
Hi,

On Mon, Dec 02, 2019 at 10:54:17AM +, Richard Earnshaw (lists) wrote:
>   - author attributions are sometimes incorrect - reported

This would disqualify that "conversion", for me at least.  Keeping all
warts we had in SVN is better than adding new lies, lies about important
matters even.

> - certain key words in otherwise not very useful summary lines are
>   also spotted and used to add [revert] or [backport] annotations to
>   the summary.

You won't see tags like that from anyone who uses the normal git commit
flows: the piece of the mail subject between [] is deleted.

> No changes are made to the main commit log, if we add a new summary 
> line, the entire original text is kept.

That is good (an important requirement even).


Segher


Re: Commit messages and the move to git

2019-12-02 Thread Richard Earnshaw (lists)
On 02/12/2019 15:35, Segher Boessenkool wrote:
> Hi,
> 
> On Mon, Dec 02, 2019 at 10:54:17AM +, Richard Earnshaw (lists) wrote:
>>   - author attributions are sometimes incorrect - reported
> 
> This would disqualify that "conversion", for me at least.  Keeping all
> warts we had in SVN is better than adding new lies, lies about important
> matters even.
Indeed, but it's easy to turn off the option that tries to do this, if
it can't be made to work correctly.  We'd then be back with the existing
'author == committer' situation.

> 
>> - certain key words in otherwise not very useful summary lines are
>>   also spotted and used to add [revert] or [backport] annotations to
>>   the summary.
> 
> You won't see tags like that from anyone who uses the normal git commit
> flows: the piece of the mail subject between [] is deleted.

Well, true if you use "git am" without the -k or -b options; false
otherwise.  We have plenty of existing patches in the repo that have
tags like this, though it doesn't appear to be the 'git way' I grant you.

We could extend the script to rewrite all [tag] attributions in tag:
form, but I'm not really sure it's worth it.

> 
>> No changes are made to the main commit log, if we add a new summary 
>> line, the entire original text is kept.
> 
> That is good (an important requirement even).
> 

Yes, I even steer clear of trimming blank lines at the head or tail of
the message, but it's possible that reposurgeon might do that itself later.

> 
> Segher
> 

The real question at this point is whether or not these commit summaries
are better than the existing ones.  Personally, I think they are (or I
wouldn't have spent the time working on this), but I'm not the only
person with an interest here...
R


Re: Commit messages and the move to git

2019-12-02 Thread Segher Boessenkool
On Mon, Dec 02, 2019 at 04:18:59PM +, Richard Earnshaw (lists) wrote:
> On 02/12/2019 15:35, Segher Boessenkool wrote:
> > On Mon, Dec 02, 2019 at 10:54:17AM +, Richard Earnshaw (lists) wrote:
> >>   - author attributions are sometimes incorrect - reported
> > 
> > This would disqualify that "conversion", for me at least.  Keeping all
> > warts we had in SVN is better than adding new lies, lies about important
> > matters even.
> Indeed, but it's easy to turn off the option that tries to do this, if
> it can't be made to work correctly.  We'd then be back with the existing
> 'author == committer' situation.

But we need to be *sure* this is done correctly.  The only safe thing
to do is to turn off all such options, if we cannot trust them.

> >> - certain key words in otherwise not very useful summary lines are
> >>   also spotted and used to add [revert] or [backport] annotations to
> >>   the summary.
> > 
> > You won't see tags like that from anyone who uses the normal git commit
> > flows: the piece of the mail subject between [] is deleted.
> 
> Well, true if you use "git am" without the -k or -b options; false
> otherwise.  We have plenty of existing patches in the repo that have
> tags like this, though it doesn't appear to be the 'git way' I grant you.

Yes, "the normal commit flows" :-)

> We could extend the script to rewrite all [tag] attributions in tag:
> form, but I'm not really sure it's worth it.

Sure; I'm just saying rewriting old commit messages in such a style that
they keep standing out from new ones is a bit of a weird choice.

> >> No changes are made to the main commit log, if we add a new summary 
> >> line, the entire original text is kept.
> > 
> > That is good (an important requirement even).
> 
> Yes, I even steer clear of trimming blank lines at the head or tail of
> the message, but it's possible that reposurgeon might do that itself later.

> The real question at this point is whether or not these commit summaries
> are better than the existing ones.  Personally, I think they are (or I
> wouldn't have spent the time working on this), but I'm not the only
> person with an interest here...

Thanks for the effort, regardless of the outcome!


Segher


Re: Branch and tag deletions

2019-12-02 Thread Segher Boessenkool
On Fri, Nov 29, 2019 at 10:31:22PM +, Joseph Myers wrote:
> On Fri, 29 Nov 2019, Segher Boessenkool wrote:
> > Please post the full names of all the tags you want to delete?
> 
> Here is a list of 645 tags proposed for removal, in the various
> categories I gave.  Vendor tags are only included where they also fall
> into one of the other categories (e.g. tags that appear to be purely
> for merge tracking and so would not idiomatically exist in git at
> all).

[ snip ]

Thanks for the list.  As far as I can see all of those are no longer
useful, so they could be jut deleted from the SVN repo (if everyone
else agrees!)  It is much safer to delete tags after the conversion to
git, because that way it is much easier to get things back if something
is lost after all, in general.


Segher


Re: Commit messages and the move to git

2019-12-02 Thread Richard Earnshaw (lists)
On 02/12/2019 17:25, Segher Boessenkool wrote:
> On Mon, Dec 02, 2019 at 04:18:59PM +, Richard Earnshaw (lists) wrote:
>> On 02/12/2019 15:35, Segher Boessenkool wrote:
>>> On Mon, Dec 02, 2019 at 10:54:17AM +, Richard Earnshaw (lists) wrote:
   - author attributions are sometimes incorrect - reported
>>>
>>> This would disqualify that "conversion", for me at least.  Keeping all
>>> warts we had in SVN is better than adding new lies, lies about important
>>> matters even.
>> Indeed, but it's easy to turn off the option that tries to do this, if
>> it can't be made to work correctly.  We'd then be back with the existing
>> 'author == committer' situation.
> 
> But we need to be *sure* this is done correctly.  The only safe thing
> to do is to turn off all such options, if we cannot trust them.

Of course.  But that's a decision that can be made quite late, because
we know we *can* turn them off if we want to.

> 
 - certain key words in otherwise not very useful summary lines are
   also spotted and used to add [revert] or [backport] annotations to
   the summary.
>>>
>>> You won't see tags like that from anyone who uses the normal git commit
>>> flows: the piece of the mail subject between [] is deleted.
>>
>> Well, true if you use "git am" without the -k or -b options; false
>> otherwise.  We have plenty of existing patches in the repo that have
>> tags like this, though it doesn't appear to be the 'git way' I grant you.
> 
> Yes, "the normal commit flows" :-)
> 

Well my normal commit flow these days is to use -b, because that only
removes "[PATCH...]" annotations.

Nevertheless, we will most likely keep any existing "[...]" tags.

>> We could extend the script to rewrite all [tag] attributions in tag:
>> form, but I'm not really sure it's worth it.
> 
> Sure; I'm just saying rewriting old commit messages in such a style that
> they keep standing out from new ones is a bit of a weird choice.
> 

One of the advantages of doing this in a script is that we have exactly
three places in the script to change, and that's a trivial operation to
do.  Tweaking the logic overall is much harder as it can have surprising
effects at times.

 No changes are made to the main commit log, if we add a new summary 
 line, the entire original text is kept.
>>>
>>> That is good (an important requirement even).
>>
>> Yes, I even steer clear of trimming blank lines at the head or tail of
>> the message, but it's possible that reposurgeon might do that itself later.
> 
>> The real question at this point is whether or not these commit summaries
>> are better than the existing ones.  Personally, I think they are (or I
>> wouldn't have spent the time working on this), but I'm not the only
>> person with an interest here...
> 
> Thanks for the effort, regardless of the outcome!
> 
> 
> Segher
> 

R.


Re: Commit messages and the move to git

2019-12-02 Thread Segher Boessenkool
On Mon, Dec 02, 2019 at 05:47:08PM +, Richard Earnshaw (lists) wrote:
> On 02/12/2019 17:25, Segher Boessenkool wrote:
> > On Mon, Dec 02, 2019 at 04:18:59PM +, Richard Earnshaw (lists) wrote:
> >> On 02/12/2019 15:35, Segher Boessenkool wrote:
> >>> On Mon, Dec 02, 2019 at 10:54:17AM +, Richard Earnshaw (lists) wrote:
>    - author attributions are sometimes incorrect - reported
> >>>
> >>> This would disqualify that "conversion", for me at least.  Keeping all
> >>> warts we had in SVN is better than adding new lies, lies about important
> >>> matters even.
> >> Indeed, but it's easy to turn off the option that tries to do this, if
> >> it can't be made to work correctly.  We'd then be back with the existing
> >> 'author == committer' situation.
> > 
> > But we need to be *sure* this is done correctly.  The only safe thing
> > to do is to turn off all such options, if we cannot trust them.
> 
> Of course.  But that's a decision that can be made quite late, because
> we know we *can* turn them off if we want to.

Do we postpone the transition another few months because we have to check
all commits for mistakes the conversion tool made because it tried to be
"smart"?

Or will we rush in these changes, unnecessary errors and all, because
people have invested time in doing this?

It is not a decision that can be made late.  It is a *design decision*.


Segher


Re: Commit messages and the move to git

2019-12-02 Thread Richard Earnshaw (lists)
On 02/12/2019 18:00, Segher Boessenkool wrote:
> On Mon, Dec 02, 2019 at 05:47:08PM +, Richard Earnshaw (lists) wrote:
>> On 02/12/2019 17:25, Segher Boessenkool wrote:
>>> On Mon, Dec 02, 2019 at 04:18:59PM +, Richard Earnshaw (lists) wrote:
 On 02/12/2019 15:35, Segher Boessenkool wrote:
> On Mon, Dec 02, 2019 at 10:54:17AM +, Richard Earnshaw (lists) wrote:
>>   - author attributions are sometimes incorrect - reported
>
> This would disqualify that "conversion", for me at least.  Keeping all
> warts we had in SVN is better than adding new lies, lies about important
> matters even.
 Indeed, but it's easy to turn off the option that tries to do this, if
 it can't be made to work correctly.  We'd then be back with the existing
 'author == committer' situation.
>>>
>>> But we need to be *sure* this is done correctly.  The only safe thing
>>> to do is to turn off all such options, if we cannot trust them.
>>
>> Of course.  But that's a decision that can be made quite late, because
>> we know we *can* turn them off if we want to.
> 
> Do we postpone the transition another few months because we have to check
> all commits for mistakes the conversion tool made because it tried to be
> "smart"?
> 
> Or will we rush in these changes, unnecessary errors and all, because
> people have invested time in doing this?
> 
> It is not a decision that can be made late.  It is a *design decision*.
> 
> 

It's a one-line edit to the lift script.  So it's a conversion *choice*.

R.


Re: Commit messages and the move to git

2019-12-02 Thread Eric S. Raymond
Segher Boessenkool :
> Do we postpone the transition another few months because we have to check
> all commits for mistakes the conversion tool made because it tried to be
> "smart"?
> 
> Or will we rush in these changes, unnecessary errors and all, because
> people have invested time in doing this?
> 
> It is not a decision that can be made late.  It is a *design decision*.

Besr in mind that the tool is continuing to improve.  There are now three
people working on it effectively full-time in response to this conversion.

We will fix the attribution bug. Compared to dealing with dumpfile
malformations that sort of thing is a pretty easy problem once we have
a way to reproduce it.

At this point my only serious worry is what kinds of contortions we'll 
need to go through to get around the effects of the GCC/EGCS merge.
I'll be concentrating on that once I finish debugging the analyzer
rewrite.
-- 
http://www.catb.org/~esr/";>Eric S. Raymond




Re: Commit messages and the move to git

2019-12-02 Thread Richard Sandiford
"Richard Earnshaw (lists)"  writes:
> The real question at this point is whether or not these commit summaries
> are better than the existing ones.  Personally, I think they are (or I
> wouldn't have spent the time working on this), but I'm not the only
> person with an interest here...

+1 for having this (not that it's a vote).  Of the two extremes,
the git-svn squashed-clog summaries aren't readable and something
ultra-conservative like "SVN commit rN" wouldn't be useful in
--oneline output.  The scripted summaries seem like a nice compromise
between the two.  I don't think it matters if the script happens to
generate a few misleading summaries here and there, given that it
preserves the original message as well.

Thanks,
Richard



Re: Commit messages and the move to git

2019-12-02 Thread Joseph Myers
On Mon, 2 Dec 2019, Segher Boessenkool wrote:

> Sure; I'm just saying rewriting old commit messages in such a style that
> they keep standing out from new ones is a bit of a weird choice.

I'd say the rewrites make them stand out *less* (if people avoid having 
new commit messages whose summary line is just the ChangeLog header line).

Simply having the Legacy-ID in the commit message will be a visible 
difference from new commit messages.  But I'm happy it's desirable to have 
it there, because references to SVN revisions in list archives are so 
common and having it in the commit messages makes it very quick and easy 
to map to a git commit id, without needing any on-the-side lists of commit 
mappings or other tools.  (While reviewing conversions to find and fix 
issues, it's *extremely* useful to have it there to help find 
corresponding commits to compare commit contents, parents, authors and 
other data between SVN and git.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Branch and tag deletions

2019-12-02 Thread Joseph Myers
On Mon, 2 Dec 2019, Segher Boessenkool wrote:

> On Fri, Nov 29, 2019 at 10:31:22PM +, Joseph Myers wrote:
> > On Fri, 29 Nov 2019, Segher Boessenkool wrote:
> > > Please post the full names of all the tags you want to delete?
> > 
> > Here is a list of 645 tags proposed for removal, in the various
> > categories I gave.  Vendor tags are only included where they also fall
> > into one of the other categories (e.g. tags that appear to be purely
> > for merge tracking and so would not idiomatically exist in git at
> > all).
> 
> [ snip ]
> 
> Thanks for the list.  As far as I can see all of those are no longer
> useful, so they could be jut deleted from the SVN repo (if everyone
> else agrees!)  It is much safer to delete tags after the conversion to
> git, because that way it is much easier to get things back if something
> is lost after all, in general.

One suggestion made in a comment on 
 was making reposurgeon put 
deleted tags and branches in refs/deleted/ so a converted version of the 
data would be available without being fetched by default.  If that were 
done, the data would be in git even for tags deleted before the 
conversion.

I should note that, while my fixes for parents of branch creation commits 
that cvs2svn messed up were based on manual review of all the cases my 
script identified as suspicious and couldn't find an automated fix for, 
where the branch existed in SVN at the time of the conversion to SVN 
(r105925), even if the branch had been deleted or renamed in SVN after 
then, my fixes for parents of tag creation commits were less exhaustive.

My fixes for tag parents should cover all the official release tags from 
the CVS period, and some others, but did not try to cover any tags 
currently suggested to be deleted, or any vendor tags.  This doesn't 
matter so much if you're only concerned about the contents of a tag, not 
its ancestry, but you should not expect that commits generated for tags in 
the CVS period have sensible parents except where fixed manually, because 
cvs2svn tended to mess up identifying parents for tags at least as much as 
it did for branches.  (Where there are bugs affecting *contents* of a tag, 
e.g. issue 167, those are of course critical bugs needing fixing to 
consider the conversion viable.)

(The typical form of bad tag parent identification is that, when the tag 
was of a point on a non-trunk branch, cvs2svn treated it as a copy of 
trunk from somewhere around the time that non-trunk branch was created 
from trunk, and then put a large set of changes in the tag-creation commit 
to give it the right contents.  So it won't have much effect on the 
results of "git tag --contains" for commits on master, which is one thing 
for which tag ancestry does matter.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Keeping the existing git mirror history available

2019-12-02 Thread Joseph Myers
I've previously noted that when we move to git, while we should use a 
clean conversion with proper author attributions, we should also keep the 
commits from the existing git mirror available somewhere as there are 
various git-only branches there and lots of references to git commit ids 
in the list archives.

This can be done either by renaming the existing mirror and keeping it 
available read-only in some public location, or by having both sets of 
objects in one repository.  Given a conversion with reposurgeon, I've now 
tested the following command as a way to get the objects from the existing 
mirror into the same repository:

git fetch --no-tags \
git://gcc.gnu.org/git/gcc.git \
'refs/heads/*:refs/heads/git-old/*' \
'refs/remotes/*:refs/heads/git-svn-old/*' \
'refs/tags/*:refs/tags/git-old/*'

Doing this increases the size of the repository (after git gc 
--aggressive) from 1.4 GB to 1.7 GB (most blob and tree objects being the 
same between the two versions of the history).  It's also possible to use 
different ref names that aren't fetched by default such as refs/git-old/*.

If someone wishes to move an existing git-only branch to be based on the 
new version of the history, rebasing is probably better than merging, to 
avoid confusing effects of a commit having the whole of both old and new 
versions of master in its ancestry.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Commit messages and the move to git

2019-12-02 Thread Segher Boessenkool
On Mon, Dec 02, 2019 at 08:24:47PM +, Joseph Myers wrote:
> On Mon, 2 Dec 2019, Segher Boessenkool wrote:
> 
> > Sure; I'm just saying rewriting old commit messages in such a style that
> > they keep standing out from new ones is a bit of a weird choice.
> 
> I'd say the rewrites make them stand out *less* (if people avoid having 
> new commit messages whose summary line is just the ChangeLog header line).

New commits will not start with [smth] in general.  Of course you *can*
do that, with enough effort.  You can also have two consecutive empty
lines in your commit messages just fine, but git won't let you without
a fight.  This is similar.

> Simply having the Legacy-ID in the commit message will be a visible 
> difference from new commit messages.  But I'm happy it's desirable to have 
> it there, because references to SVN revisions in list archives are so 
> common and having it in the commit messages makes it very quick and easy 
> to map to a git commit id, without needing any on-the-side lists of commit 
> mappings or other tools.

Yes.  Either in the subject line, or later in the commit message (as
with git-svn).  We can quibble about where is best, but (hopefully)
everyone agrees we need the SVN id *somewhere* :-)


Segher


Re: Branch and tag deletions

2019-12-02 Thread Segher Boessenkool
On Mon, Dec 02, 2019 at 08:37:14PM +, Joseph Myers wrote:
> On Mon, 2 Dec 2019, Segher Boessenkool wrote:
> > Thanks for the list.  As far as I can see all of those are no longer
> > useful, so they could be jut deleted from the SVN repo (if everyone
> > else agrees!)  It is much safer to delete tags after the conversion to
> > git, because that way it is much easier to get things back if something
> > is lost after all, in general.
> 
> One suggestion made in a comment on 
>  was making reposurgeon put 
> deleted tags and branches in refs/deleted/ so a converted version of the 
> data would be available without being fetched by default.  If that were 
> done, the data would be in git even for tags deleted before the 
> conversion.

That sounds simpler than it is...  After using this for a while you'll
get names that you want to delete, but that name *already* is in
/refs/deleted.  So what will you name it then?  People will still need
to be able to find it.

But we could make an "old-svn" hierarchy or similar that just has
everything svn has now (and will never change, so it will never cause
conflicts).


Segher