Michael Bien created MINDEXER-185:
-------------------------------------

             Summary: Document filter doesn't seem to do anything
                 Key: MINDEXER-185
                 URL: https://issues.apache.org/jira/browse/MINDEXER-185
             Project: Maven Indexer
          Issue Type: Bug
    Affects Versions: 7.0.1
            Reporter: Michael Bien


Hello devs!

 

I tried to filter the index during extraction using a DocumentFilter and it 
didn't appear to do anything.

As test, I simply set {{indexUpdateRequest.setDocumentFilter(doc -> false);}} 
before calling {{DefaultIndexUpdater.fetchAndUpdateIndex}} and the extracted 
index had the same size of 5.6gb as without the filter.

 

The filter is actually called and it does also add a few minutes to the 
extraction time.

https://github.com/apache/maven-indexer/blob/1cd122b1487150613005c8f9aced9aec20fded3e/indexer-core/src/main/java/org/apache/maven/index/updater/DefaultIndexUpdater.java#L238-L241

 

I am not sure why the implementation is filtering the index *after* extraction. 
Wouldn't it be easier and also more efficient to do it in IndexDataReader?
e.g 
https://github.com/apache/maven-indexer/blob/1cd122b1487150613005c8f9aced9aec20fded3e/indexer-core/src/main/java/org/apache/maven/index/updater/IndexDataReader.java#L269



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to