Hi Frederik,
Thanks for the kind answer and pointers.
Yes, we know that non-consensual mirroring, as well as scrapping, is
explicitly forbidden. Hence the question here. :-)
It should be noted however that our case is a bit specific: we want to
get only *some* types of artefacts (poms and src jars), but we want
*all* of them. So it's a special case of partial mirroring. Furthermore
we will only access them *once* for the archiving so the "re-use"
feature of a proxy, mirror or similar is not needed.
Maybe a repo maintainer could provide some more wisdom?
Thanks, cheers!
--
boris
On 14/06/2021 13:41, Frederik Boster wrote:
I am not in any way affiliated with Apache or Sonatype. So take my opinion
with a grain of salt :)
Trying to mirror the entire Maven Central repository will unfortunately get
you automatically banned.
To circumvent that I would suggest you setup your own Maven Central mirror
first. [1]
[1]
https://maven.apache.org/guides/mini/guide-mirror-settings.html#creating-your-own-mirror
On Mon, Jun 14, 2021, 12:12 Boris Baldassari <[email protected]>
wrote:
Hiho good people,
I am currently developing a Maven repositories connector for the
Software Heritage Foundation [1]. In a nutshell, the SWH aims to
archive all existing source code in the world, and provides useful
publicly available services and related tools (unique IDs/DOIs, search,
datasets, graph tools..). It's all open-source, and many large forges
and software systems have already been archived (GitHub, GitLab, npm,
pypi, debian packages, CRAN..) [2]. Now we would like to archive the
Maven ecosystem.
[1] https://www.softwareheritage.org/
[2] https://archive.softwareheritage.org/
I'm reaching out to ask for wisdom and start a discussion about how this
could be achieved without impacting anybody, i.e. neither Maven
repositories maintainers nor the users. Our plan for now is to use the
maven indexer indexes for the listing, and then download poms and source
jars, in a way that we see as the most efficient and fair. We of course
respect all rate-limiting policies (and http error codes), and we are
polite and patient (although tenacious).
So, here are my questions:
* Who should we talk to to achieve that? i.e. are there maven repository
maintainers on the list, or do you know of a better place to ask?
* Although we believe the above mentioned process is the most efficient
and fair one, maybe there is a better way to list, and archive artefact
sources? Any feedback or mere thoughts are welcome.
Thanks in advance, have a wonderful day!
--
Boris Baldassari
Castalia Solutions -- Elegant Software Engineering
Web: http://castalia.solutions
Tel: +33 6 48 03 82 89
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]