Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-30 Thread Richard Earnshaw (lists)
On 29/12/2019 22:56, Eric S. Raymond wrote: > Richard Earnshaw (lists) : >> Weak in the sense that it isn't proof given that the user name is >> partially redacted. There's nothing in the gcc archives that gives a >> full name either, unfortunately. >> >> Yes, it's the most likely match, but there

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Eric S. Raymond
Richard Earnshaw (lists) : > Weak in the sense that it isn't proof given that the user name is > partially redacted. There's nothing in the gcc archives that gives a > full name either, unfortunately. > > Yes, it's the most likely match, but there's still an element of doubt. > > R. https://gro

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Joseph Myers
On Sun, 29 Dec 2019, Joseph Myers wrote: > I've now made those changes to the checked-in list so it's pure UTF-8, and > thus easier to review and edit. We still need to implement code in > bugdb.py to use that list to pick the preferred form from each list of > variants (and people may wish to

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Richard Earnshaw (lists)
On 29/12/2019 22:24, Eric S. Raymond wrote: > Richard Earnshaw (lists) : >> Also, for this one: >> >> # "47044": "", >> >> There's some (relatively weak) evidence that this is Bjørn Wennberg (eg >> https://groups.google.com/forum/#!msg/comp.databases.sybase/Uz8ICef9Qr8/uPwanH6is60J), >> but in the

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Eric S. Raymond
Richard Earnshaw (lists) : > Also, for this one: > > # "47044": "", > > There's some (relatively weak) evidence that this is Bjørn Wennberg (eg > https://groups.google.com/forum/#!msg/comp.databases.sybase/Uz8ICef9Qr8/uPwanH6is60J), > but in the absence of stronger evidence, I'm going to just pu

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Joseph Myers
On Sat, 28 Dec 2019, Joseph Myers wrote: > Concretely, what I'd suggest is: convert ISO-8859-1 entries in the > checked-in list to UTF-8, removing anything that thereby becomes a > duplicate or unnecessary; handle anything whose encoding isn't simply > ISO-8859-1 or UTF-8 via a hardcoded entry

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Jeff Law
On Sun, 2019-12-29 at 07:32 -0500, Eric S. Raymond wrote: > Richard Earnshaw (lists) : > > I've just commented that one out for now; if anybody knows the correct > > addresses, please let me know. Also, there's one joint list that I've > > not attempted to fix at this time. > > # "28488": "Jim Ki

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Richard Earnshaw (lists)
On 29/12/2019 12:32, Eric S. Raymond wrote: > Richard Earnshaw (lists) : >> I've just commented that one out for now; if anybody knows the correct >> addresses, please let me know.  Also, there's one joint list that I've >> not attempted to fix at this time. > >> #  "28488": "Jim Kingdon

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-29 Thread Eric S. Raymond
Richard Earnshaw (lists) : > I've just commented that one out for now; if anybody knows the correct > addresses, please let me know. Also, there's one joint list that I've > not attempted to fix at this time. > # "28488": "Jim Kingdon ", That's Jim Kingdon the forme

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Richard Earnshaw (lists)
On 27/12/2019 19:47, Richard Earnshaw (lists) wrote: > Email addresses from the ChangeLog files are not validated during > commits, so a number of typos exist in the extracted data. I've > extracted the 'Author:' entry from a prototype conversion and then piped > that through sort and uniq -c. Su

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Richard Earnshaw (lists)
On 28/12/2019 20:11, Segher Boessenkool wrote: > On Sat, Dec 28, 2019 at 04:34:20PM +, Richard Earnshaw (lists) wrote: >> On 28/12/2019 14:54, Segher Boessenkool wrote: >>> On Sat, Dec 28, 2019 at 01:05:13PM +, Joseph Myers wrote: On Sat, 28 Dec 2019, Segher Boessenkool wrote: >>>

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Segher Boessenkool
On Sat, Dec 28, 2019 at 04:34:20PM +, Richard Earnshaw (lists) wrote: > On 28/12/2019 14:54, Segher Boessenkool wrote: > > On Sat, Dec 28, 2019 at 01:05:13PM +, Joseph Myers wrote: > >> On Sat, 28 Dec 2019, Segher Boessenkool wrote: > >> > >>> On Fri, Dec 27, 2019 at 07:47:02PM +, Richa

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Eric S. Raymond
Joseph Myers : > Concretely, what I'd suggest is: convert ISO-8859-1 entries in the > checked-in list to UTF-8, removing anything that thereby becomes a > duplicate or unnecessary; handle anything whose encoding isn't simply > ISO-8859-1 or UTF-8 via a hardcoded entry in bugdb.py using hex escap

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Joseph Myers
On Sat, 28 Dec 2019, Joseph Myers wrote: > On Sat, 28 Dec 2019, Richard Earnshaw (lists) wrote: > > > I've added the list of emails that I posted yesterday to the conversion > > scripts. I've not written anything to reprocess that yet. I want to > > leave that until we've completed the general

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Andreas Schwab
On Dez 28 2019, Richard Earnshaw (lists) wrote: > I don't know whether tools that analyse git repos to generate statistics > about users contributions care about canonicalization of names; they may > just key off email addresses. git shortlog supports that via .mailmap. Andreas. -- Andreas Sch

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Joseph Myers
On Sat, 28 Dec 2019, Richard Earnshaw (lists) wrote: > I've added the list of emails that I posted yesterday to the conversion > scripts. I've not written anything to reprocess that yet. I want to > leave that until we've completed the general review of the preferred > changes we want. Auto-gen

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Richard Earnshaw (lists)
On 28/12/2019 17:14, Joseph Myers wrote: > On Sat, 28 Dec 2019, Richard Earnshaw (lists) wrote: > >> My suggestion would be that we try to canonicalize all the author >> entries to UTF-8 as that avoids the limitations of ISO-8859-1, but that >> would probably need further fixups to detect the addi

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Joseph Myers
On Sat, 28 Dec 2019, Richard Earnshaw (lists) wrote: > My suggestion would be that we try to canonicalize all the author > entries to UTF-8 as that avoids the limitations of ISO-8859-1, but that > would probably need further fixups to detect the additional names that > need rewriting. What I've i

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Richard Earnshaw (lists)
On 28/12/2019 12:04, Jakub Jelinek wrote: > On Fri, Dec 27, 2019 at 07:47:02PM +, Richard Earnshaw (lists) wrote: >> Email addresses from the ChangeLog files are not validated during >> commits, so a number of typos exist in the extracted data. I've >> extracted the 'Author:' entry from a prot

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Joseph Myers
On Sat, 28 Dec 2019, Segher Boessenkool wrote: > > This is about extracting attributions from changelogs when unambiguous > > there, and then correcting mistakes or otherwise making minor variants > > more uniform. > > Yes, and I'm saying you probably shouldn't do that. Extracting attributions

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Richard Earnshaw (lists)
On 28/12/2019 14:54, Segher Boessenkool wrote: > On Sat, Dec 28, 2019 at 01:05:13PM +, Joseph Myers wrote: >> On Sat, 28 Dec 2019, Segher Boessenkool wrote: >> >>> On Fri, Dec 27, 2019 at 07:47:02PM +, Richard Earnshaw (lists) wrote: 1 Author: Segher Boessenkool *730 Au

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Segher Boessenkool
On Sat, Dec 28, 2019 at 01:05:13PM +, Joseph Myers wrote: > On Sat, 28 Dec 2019, Segher Boessenkool wrote: > > > On Fri, Dec 27, 2019 at 07:47:02PM +, Richard Earnshaw (lists) wrote: > > > 1 Author: Segher Boessenkool > > > *730 Author: Segher Boessenkool > > > 2 Author:

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Joseph Myers
On Sat, 28 Dec 2019, Segher Boessenkool wrote: > On Fri, Dec 27, 2019 at 07:47:02PM +, Richard Earnshaw (lists) wrote: > > 1 Author: Segher Boessenkool > > *730 Author: Segher Boessenkool > > 2 Author: Segher Boesssenkool > > The first and third are only in changelogs. The

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Jakub Jelinek
On Fri, Dec 27, 2019 at 07:47:02PM +, Richard Earnshaw (lists) wrote: > Email addresses from the ChangeLog files are not validated during > commits, so a number of typos exist in the extracted data. I've > extracted the 'Author:' entry from a prototype conversion and then piped > that through

Re: Git conversion: fixing email addresses from ChangeLog files

2019-12-28 Thread Segher Boessenkool
On Fri, Dec 27, 2019 at 07:47:02PM +, Richard Earnshaw (lists) wrote: > 1 Author: Segher Boessenkool > *730 Author: Segher Boessenkool > 2 Author: Segher Boesssenkool The first and third are only in changelogs. The second even happened only once, afaics? These errors only

Git conversion: fixing email addresses from ChangeLog files

2019-12-27 Thread Richard Earnshaw (lists)
Email addresses from the ChangeLog files are not validated during commits, so a number of typos exist in the extracted data. I've extracted the 'Author:' entry from a prototype conversion and then piped that through sort and uniq -c. Subsequent analysis shows the following addresses/names that ar