[ https://issues.apache.org/jira/browse/MRESOLVER-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623657#comment-17623657 ]
ASF GitHub Bot commented on MRESOLVER-274: ------------------------------------------ cstamas commented on code in PR #197: URL: https://github.com/apache/maven-resolver/pull/197#discussion_r1004161901 ########## src/site/markdown/remote-repository-filtering.md: ########## @@ -0,0 +1,70 @@ +# Remote Repository Filtering +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +A new Maven Resolver feature that allows filtering of Artifact by RemoteRepository based on various (extensible) +criteria. + +## Why? + +Remote Repository Filtering (RRF) is a long asked feature of Maven, and plays huge role when your build uses +several remote repositories. In such cases Maven "searches" the ordered list (effective POM) of remote repositories, +and artifact gets resolved using "first wins" strategy. This have several implications: + +* your build gets slower, as if your artifact is in Nth repository, Maven must make N-1 requests that will result in + 404 Not Found only to get to Nth repository to finally get the artifact. +* you build "leaks" artifact requests, as those repositories are asked for artifacts, that does not (or worse, + cannot) have them. Still, those remote repository operators do get your requests in access logs. +* to "simplify" things, users tend to use MRM "group" (or "virtual") repositories, that causes data loss on + Maven Project side (project loses artifact origin information) and ends up in disasters, as at the end these + "super-uber groups" grow uncontrollably, their member count become uncontrollabble (as new members are being + added as time passes), or created groups count grows uncontrollably, and project start loosing the knownled + about their required remote repositories, needed to (re)build a project, hence these projects become + unbuildable without the MRM, projects become bound to MRM. + +So Maven by default gets slower as remote repositories are added, leaks your own build informations to remote +repository operators, and current solutions offered to solve this problem just end up in disasters (most often). + +## What it is? + +Imagine you can instruct Maven which repository can contain what artifact? Instead of "round robin" searching +for artifacts in remote repositories, Maven could be instructed in controlled way to directly reach only the +needed remote repository. + +With RRF, Maven build does NOT have to slow down with new remote repositories added, and will not leak either +build information anywhere, as it will get things from where they should be get from. + +## What it is not? + +When it solely comes to dependencies, don't forget +[maven-enforcer-plugin](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rules that are doing +exactly that. RRF is NOT an alternative means to these enforcer rules, they are alternative tools to make your build +more faster and more private, optimized, without loosing build information (remote repositories should be in POM). + +## Maven Central is special + +Maven Central (MC) repository is special in this respect, as Maven will always try to get things from here, as your build, +plugins, plugin dependencies, extension, etc will most often come from here. While you CAN filter MC, filtering MC is +most often a bad idea (filtering, as in "limiting what can come from it"). On other hand, MC itself offers helps +to prevent request leakage to it (publishes available prefixes, see below). + +So, **most often** limiting "what can be fetched" from MC is a bad idea, it **can be done** but in very very cautious way, +as otherwise you risk your build. RRF does not distinguish the "context" of an artifact, it merely filters them out +by {artifact, remoteRepository) pair, and by limiting MC you can easily get into state where you break your build (as +plugin depends on filtered artifact). Review Comment: I plan to extend doco, probably to reuse this "demo" (not code but text from it): https://github.com/cstamas/rrf-demo > Introduce Remote Repository Filter feature > ------------------------------------------ > > Key: MRESOLVER-274 > URL: https://issues.apache.org/jira/browse/MRESOLVER-274 > Project: Maven Resolver > Issue Type: New Feature > Components: Resolver > Reporter: Tamas Cservenak > Assignee: Tamas Cservenak > Priority: Major > Fix For: 1.9.0 > > > The feature, as it's name says should be able to "filter" RemoteRepositories > by some criteria ("known bad GAVs", "allowed groupId", etc). > In short, this feature allows following filtering: "should be Artifact > available from RemoteRepository?" and is able to employ several combination > (via consensus, or later possibly other strategies) of several "filter > sources" that are extensible (via adding new components). > Filter is used in two places: > * in connector, preventing remote artifact to be fetched from remote > repository (100% reliable) > * in resolution, preventing locally *cached* artifact to be resolved > (reliable as much as your local repository is "clean", ie. if you used Simple > LRM on it, it does not track remote origins will fail to filter, while > EnhancedLRM does track it and will work as expected). > By default this feature is "dormant" (resolver behaves exactly same as before > without it). This is intended as "low level" feature that later can be built > upon, and implement some more user friendly solutions like MNG-6763. Hence, > this issue and resolver code changes are NOT meant to completely implement > MNG-6763, but more like to provide needed (lower level) functionalities to > make it possible. > Filters implemented in this round: > * groupId - provide a list of groupIds per remote repository > * prefix - use prefixes file for allowed prefixes (example central > [https://repo.maven.apache.org/maven2/.meta/prefixes.txt] or ASF releases > [https://repository.apache.org/content/repositories/releases/.meta/prefixes.txt)] > * maybe package up an artifact holding list of "known" bad artifacts and > consume that (and enforce it) > * etc... -- This message was sent by Atlassian Jira (v8.20.10#820010)