I'm working on a script to track contributors so that (A) we can track project health for ASF board report purposes and (B) we can possibly share a nice "Thank you" listing contributors in release announcements. Other purposes might crop up. GitHub's contributors report has serious shortcomings[1] so I'm not using that.
So far I have something like this: git log main --since="3 months ago" --pretty="Author: %an <%ae>%n%B" | awk -F': ' '/^(Author|Co-authored-by): / {print $2}' | sort | uniq -c But needs deduplication because most people have multiple entries. With the complexity of deduplication, I'd convert this to Python and put in dev-tools/scripts and create a "contributors.txt" file somewhere that contains a full name, primary email, and email aliases. I'm sure it's debatable to go this route vs CHANGES.txt but the latter is harder to parse and ... I dunno; I don't like that it's so custom compared to a generic Git metadata approach. But maybe the dedupe wouldn't be necessary (just fix CHANGES.txt for dups), and wouldn't include trivial edits (for better/worse). CHANGES.txt would be more accurate for version-specific contribution attribution (since CHANGES.txt is organized this way but harder to do between arbitrary commits/dates. [1] https://docs.github.com/en/repositories/viewing-activity-and-data-for-your-repository/viewing-a-projects-contributors#troubleshooting-contributors ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org