jira-importer opened a new issue, #563: URL: https://github.com/apache/maven-indexer/issues/563
**[Michael Bien](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mbien)** opened **[MINDEXER-185](https://issues.apache.org/jira/browse/MINDEXER-185?redirect=false)** and commented Hello devs! I tried to filter the index during extraction using a DocumentFilter and it didn't appear to do anything. As test, I simply set `indexUpdateRequest.setDocumentFilter(doc -> false);` before calling `DefaultIndexUpdater.fetchAndUpdateIndex` and the extracted index had the same size of 5.6gb as without the filter. The filter is actually called and it does also add a few minutes to the extraction time. https://github.com/apache/maven-indexer/blob/1cd122b1487150613005c8f9aced9aec20fded3e/indexer-core/src/main/java/org/apache/maven/index/updater/DefaultIndexUpdater.java#L238-L241 I am not sure why the implementation is filtering the index **after** extraction. Wouldn't it be easier and also more efficient to do it in IndexDataReader? e.g https://github.com/apache/maven-indexer/blob/1cd122b1487150613005c8f9aced9aec20fded3e/indexer-core/src/main/java/org/apache/maven/index/updater/IndexDataReader.java#L269 --- **Affects:** 7.0.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@maven.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org