On 02/12/2019 10:54, Richard Earnshaw (lists) wrote: > On 19/11/2019 14:56, Jason Merrill wrote: >> On Mon, Nov 18, 2019 at 4:38 PM Richard Earnshaw (lists) < >> richard.earns...@arm.com> wrote: >> >>> On 18/11/2019 20:53, Jason Merrill wrote: >>>> On Mon, Nov 18, 2019 at 2:51 PM Segher Boessenkool < >>>> seg...@kernel.crashing.org> wrote: >>>> >>>>> On Mon, Nov 18, 2019 at 07:21:22PM +0000, Richard Earnshaw (lists) >>> wrote: >>>>>> On 18/11/2019 18:53, Segher Boessenkool wrote: >>>>>>> PR target/92140: clang vs gcc optimizing with adc/sbb >>>>>>> PR fortran/91926: assumed rank optional >>>>>>> PR tree-optimization/91532: [SVE] Redundant predicated store in >>>>> gcc.target/aarch64/fmla_2.c >>>>>>> PR tree-optimization/92161: ICE in >>>>>>> vect_get_vec_def_for_stmt_copy, at >>>>> tree-vect-stmts.c:1687 >>>>>>> PR tree-optimization/92162: ICE in vect_create_epilog_for_reduction, >>>>> at tree-vect-loop.c:4252 >>>>>>> PR c++/92015: internal compiler error: in >>>>>>> cxx_eval_array_reference, at >>>>> cp/constexpr.c:2568 >>>>>>> PR tree-optimization/92173: ICE in optab_for_tree_code, at >>>>> optabs-tree.c:81 >>>>>>> PR tree-optimization/92173: ICE in optab_for_tree_code, at >>>>> optabs-tree.c:81 >>>>>>> PR fortran/92174: runtime error: index 15 out of bounds for type >>>>> 'gfc_expr *[15] >>>>>>> >>>>>>> Most of these aren't helpful at all, and none of these are good >>>>>>> commit >>>>>>> summaries. The PR92173 one actually has identical commit messages >>> btw, >>>>>>> huh. Ah, the second one (r277288) has the wrong changelog, but >>>>>>> in the >>>>>>> actual changelog file as well, not something any tool could fix >>>>>>> up (or >>>>>>> have we reached the singularity?) >>>>>> >>>>>> Identical commits are normally from where the same commit is made to >>>>>> multiple branches. It's not uncommon to see this when bugs are >>>>>> fixed. >>>>> >>>>> This is an actual mistake. The commits are not identical at all, just >>>>> the commit messages are (and the changelog entries, too). Not >>>>> something >>>>> that happens to ften, but of course I hit it in the first random >>>>> thing I >>>>> pick :-) >>>>> >>>>>> Ultimately the question here is whether something like the above is >>> more >>>>>> or less useful than what we have today, which is summary lines of the >>>>> form: >>>>>> >>>>>> <date> <user> <email> >>>>> >>>>> I already said I would prefer things like >>>>> Patch related to PR323 >>>>> as the patch subject lines. No one argues that the current state of >>>>> affairs is good. I argue that replacing this with often wrong and >>>>> irrelevant information isn't the best we can do. >>>>> >>>> >>>> How about using the first line that isn't a ChangeLog date/author line, >>>> without trying to rewrite/augment it? >>>> >>>> Jason >>>> >>> >>> It would certainly be another way of doing it. Sometimes it would >>> produce almost the same as an unadulterated PR; sometimes it would >>> produce something more meaningful and sometimes it would be pretty >>> useless. It probably would hit more cases than my current script in >>> that it wouldn't require the commit to mention a PR in it. >>> >>> The main problem is that the first line is often incomplete, and much of >>> it is also wasted with elements like the full path to a file that is >>> quite deep in the tree. Lets take a quick example (the first I found in >>> the dump I have). >>> >>> 1998-12-17 Vladimir N. Makarov <vmaka...@cygnus.com> >>> * config/i60/i960.md (extendqihi2): Fix typo (usage ',' >>> instead of >>> ';'). >>> 1998-12-17 Michael Tiemann <tiem...@axon.cygnus.com> >>> * i960.md (extend*, zero_extend*): Don't generate rtl that >>> looks >>> like (subreg:SI (reg:SI N) 0), because it's wrong, and it hides >>> optimizations from the combiner. >>> >>> Firstly, this example misses a blank line between the author and the >>> change message itself, which makes distinguishing between this and the >>> multiple authors case harder. Secondly, the entry really spans two >>> lines and cutting it off at the end of the first line would be, well a >>> bit odd. We could try to piece things together more, by combining lines >>> until we find a sentence end ( \.$ or \.\s\s ), and we could also strip >>> off more of the path components to just leave the name of the file >>> committed. For example, >>> >>> i960.md (extendqihi2): Fix typo (usage ',' instead of ';'). >>> >>> That might work better, but obviously it's harder to handle and thus >>> more likely to create erroneous summaries. >>> >> >> Yep. I don't think we need to worry about getting optimal one-line >> summaries for ancient commits; something reasonably unique should be >> plenty. >> > > Attached is the latest version of my script. I used (very nearly) this > to produce a conversion over the weekend and I've uploaded that here: > > https://gitlab.com/rearnsha/gcc-trial-20191130 > > Note, that I might blow this away at any time. IT IS NOT A FINAL > CONVERSION. > > Some other things to note: > - there are a number of known issues with the version of reposurgeon > used for this that are being worked on > - emptycommit-* tags - my control script was out-of-date > - *deleted* branches - this is being worked on > - weird dependencies around merges - this is being worked on > - author attributions are sometimes incorrect - reported >
I've just pushed a new trial conversion: https://gitlab.com/rearnsha/gcc-trial2-20191130 The main differences between this and the previous trial are: - The author attributions should now be fixed, please let me know if you see any anomalies in this respect. - the emptycommit-* tags/branches are now gone. - the 'tags' used for revert and backport now use more gittish style revert: and backport: - the log entries for c++ style functions containing :: are now handled correctly by my summary generation script. Other issues are still being worked on. R. > The main difference between the attached script and the one I used for > this conversion is that ChangeLog change that contain :: inside a > function list is now handled correctly, resulting in a number of cases > that were previously being missed now being correctly handled. > > Choices I made: > - When a PR is used to derive the summary, I prefix this with 're' (as > in the Latin 'in re'. > - long change hunks produce poor summaries. To reduce the overhead: > - path names are removed, leaving just the final file name > - multiple files are replaced with [...] after the first filename > - similarly, multiple function names are replaced with [...] > - very long comments are truncated, preferably at the strongest > punctuation mark, but sometimes after key words, such as 'if', > 'when', 'unless' and a few more. Ultimately, if the line is > still too long, we just break after an arbitrary space. > - Where possible useful summary lines that appear after an author, > attribution are hoisted as a summary. > - certain key words in otherwise not very useful summary lines are > also spotted and used to add [revert] or [backport] annotations to > the summary. > > No changes are made to the main commit log, if we add a new summary > line, the entire original text is kept. > > An example of a summary produced by this is for the commit to r278572, > where the original log entry is: > > > Backported from mainline > 2019-08-02 Jakub Jelinek <ja...@redhat.com> > > * quadmath.h (M_Eq, M_LOG2Eq, M_LOG10Eq, M_LN2q, M_LN10q, M_PIq, > M_PI_2q, M_PI_4q, M_1_PIq, M_2_PIq, M_2_SQRTPIq, M_SQRT2q, > M_SQRT1_2q): Use two more decimal places. > > And the script then generates: > > [backport] quadmath.h (M_Eq, [...]): Use two more decimal places. > > as the summary. > > R.