[ https://issues.apache.org/jira/browse/MINDEXER-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718473#comment-17718473 ]
Michael Bien commented on MINDEXER-185: --------------------------------------- proposal: https://github.com/apache/maven-indexer/pull/302 > Document filter doesn't seem to do anything > ------------------------------------------- > > Key: MINDEXER-185 > URL: https://issues.apache.org/jira/browse/MINDEXER-185 > Project: Maven Indexer > Issue Type: Bug > Affects Versions: 7.0.1 > Reporter: Michael Bien > Priority: Major > > Hello devs! > > I tried to filter the index during extraction using a DocumentFilter and it > didn't appear to do anything. > As test, I simply set {{indexUpdateRequest.setDocumentFilter(doc -> false);}} > before calling {{DefaultIndexUpdater.fetchAndUpdateIndex}} and the extracted > index had the same size of 5.6gb as without the filter. > > The filter is actually called and it does also add a few minutes to the > extraction time. > https://github.com/apache/maven-indexer/blob/1cd122b1487150613005c8f9aced9aec20fded3e/indexer-core/src/main/java/org/apache/maven/index/updater/DefaultIndexUpdater.java#L238-L241 > > I am not sure why the implementation is filtering the index *after* > extraction. Wouldn't it be easier and also more efficient to do it in > IndexDataReader? > e.g > https://github.com/apache/maven-indexer/blob/1cd122b1487150613005c8f9aced9aec20fded3e/indexer-core/src/main/java/org/apache/maven/index/updater/IndexDataReader.java#L269 -- This message was sent by Atlassian Jira (v8.20.10#820010)