Hiho good people,

I am currently developing a Maven repositories connector for the Software Heritage Foundation [1]. In a nutshell, the SWH aims to archive all existing source code in the world, and provides useful publicly available services and related tools (unique IDs/DOIs, search, datasets, graph tools..). It's all open-source, and many large forges and software systems have already been archived (GitHub, GitLab, npm, pypi, debian packages, CRAN..) [2]. Now we would like to archive the Maven ecosystem.

[1] https://www.softwareheritage.org/
[2] https://archive.softwareheritage.org/

I'm reaching out to ask for wisdom and start a discussion about how this could be achieved without impacting anybody, i.e. neither Maven repositories maintainers nor the users. Our plan for now is to use the maven indexer indexes for the listing, and then download poms and source jars, in a way that we see as the most efficient and fair. We of course respect all rate-limiting policies (and http error codes), and we are polite and patient (although tenacious).

So, here are my questions:

* Who should we talk to to achieve that? i.e. are there maven repository maintainers on the list, or do you know of a better place to ask?

* Although we believe the above mentioned process is the most efficient and fair one, maybe there is a better way to list, and archive artefact sources? Any feedback or mere thoughts are welcome.


Thanks in advance, have a wonderful day!


--
Boris Baldassari
Castalia Solutions -- Elegant Software Engineering
Web: http://castalia.solutions
Tel: +33 6 48 03 82 89

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to