[
https://issues.apache.org/jira/browse/KAFKA-14995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880049#comment-17880049
]
Josep Prat commented on KAFKA-14995:
------------------------------------
Hi [~joaopedrofonseca]
Usually when someone takes over a PR they would need to keep partial
attribution to the original author. This is typically done via the co-author
line in the commit message.
If the work you need to do is mostly done in that PR it might be worth it,
otherwise you might want to start from scratch.
Regarding the language, I would refrain from adding another language to the
project and I'd try to stick with Java, Python or Bash.
Thanks for taking this task!
> Automate asf.yaml collaborators refresh
> ---------------------------------------
>
> Key: KAFKA-14995
> URL: https://issues.apache.org/jira/browse/KAFKA-14995
> Project: Kafka
> Issue Type: Improvement
> Reporter: John Roesler
> Assignee: João Pedro Fonseca
> Priority: Minor
> Labels: newbie
>
> We have added a policy to use the asf.yaml Github Collaborators:
> [https://github.com/apache/kafka-site/pull/510]
> The policy states that we set this list to be the top 20 commit authors who
> are not Kafka committers. Unfortunately, it's not trivial to compute this
> list.
> Here is the process I followed to generate the list the first time (note that
> I generated this list on 2023-04-28, so the lookback is one year:
> 1. List authors by commit volume in the last year:
> {code:java}
> $ git shortlog --email --numbered --summary --since=2022-04-28 | vim {code}
> 2. manually filter out the authors who are committers, based on
> [https://kafka.apache.org/committers]
> 3. truncate the list to 20 authors
> 4. for each author
> 4a. Find a commit in the `git log` that they were the author on:
> {code:java}
> commit 440bed2391338dc10fe4d36ab17dc104b61b85e8
> Author: hudeqi <[email protected]>
> Date: Fri May 12 14:03:17 2023 +0800
> ...{code}
> 4b. Look up that commit in Github:
> [https://github.com/apache/kafka/commit/440bed2391338dc10fe4d36ab17dc104b61b85e8]
> 4c. Copy their Github username into .asf.yaml under both the PR whitelist and
> the Collaborators lists.
> 5. Send a PR to update .asf.yaml: [https://github.com/apache/kafka/pull/13713]
>
> This is pretty time consuming and is very scriptable. Two complications:
> * To do the filtering, we need to map from Git log "Author" to documented
> Kafka "Committer" that we can use to perform the filter. Suggestion: just
> update the structure of the "Committers" page to include their Git "Author"
> name and email
> ([https://github.com/apache/kafka-site/blob/asf-site/committers.html)]
> * To generate the YAML lists, we need to map from Git log "Author" to Github
> username. There's presumably some way to do this in the Github REST API (the
> mapping is based on the email, IIUC), or we could also just update the
> Committers page to also document each committer's Github username.
>
> Ideally, we would write this script (to be stored in the Apache Kafka repo)
> and create a Github Action to run it every three months.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)