[I] [MINDEXER-185] Document filter doesn't seem to do anything [maven-indexer]

via GitHub Thu, 12 Jun 2025 03:03:30 -0700


jira-importer opened a new issue, #563:
URL: https://github.com/apache/maven-indexer/issues/563


   **[Michael 
Bien](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mbien)** 
opened 
**[MINDEXER-185](https://issues.apache.org/jira/browse/MINDEXER-185?redirect=false)**
 and commented
   
   Hello devs!
   
    
   
   I tried to filter the index during extraction using a DocumentFilter and it 
didn't appear to do anything.
   
   As test, I simply set `indexUpdateRequest.setDocumentFilter(doc -> false);` 
before calling `DefaultIndexUpdater.fetchAndUpdateIndex` and the extracted 
index had the same size of 5.6gb as without the filter.
   
    
   
   The filter is actually called and it does also add a few minutes to the 
extraction time.
   
   
https://github.com/apache/maven-indexer/blob/1cd122b1487150613005c8f9aced9aec20fded3e/indexer-core/src/main/java/org/apache/maven/index/updater/DefaultIndexUpdater.java#L238-L241
   
    
   
   I am not sure why the implementation is filtering the index **after** 
extraction. Wouldn't it be easier and also more efficient to do it in 
IndexDataReader?
   e.g 
https://github.com/apache/maven-indexer/blob/1cd122b1487150613005c8f9aced9aec20fded3e/indexer-core/src/main/java/org/apache/maven/index/updater/IndexDataReader.java#L269
   
   
   ---
   
   **Affects:** 7.0.1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@maven.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [MINDEXER-185] Document filter doesn't seem to do anything [maven-indexer]

Reply via email to