jira-importer opened a new issue, #523:
URL: https://github.com/apache/maven-indexer/issues/523

   **[Tamas 
Cservenak](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=cstamas)**
 opened 
**[MINDEXER-151](https://issues.apache.org/jira/browse/MINDEXER-151?redirect=false)**
 and commented
   
   Currently, if you execute from examples the BasicUsageExample, it will 
perform "full" update, and the full update (to get from "empty" index to "up to 
date" index) takes 15 or more minutes. Yes, Central index is huge, but there is 
room for improvement.
   
   Steps happening during update(s):
   * properties file downloaded
   * GZ file(s) downloaded (depending is it incremental or full)
   * the GZ files are processed into temporary Lucene index
   * the target (being updated) indexing context index is "replaced" (or 
merged, depends) with temporary Lucene index
   
   Downloading files are several seconds, but it is the processing of the GZIP 
raw records into Lucene index that takes long time. This can be improved.
   
   IndexUpdateRequest got new field `int threads` with default value of 1 (same 
will happen as before). When set to something greater than 1 (accepted values 
are positive numbers), then `IndexDataReader` will behave slightly differently 
that with threads=1: it will create N (threads) "silo" indexes, spawn N 
threads, and process the input file on N threads into N silos that are merged 
at the end. This should improve huge update times (as index is huge as well), 
ideally halve it as experiments show (ideal on my HW is 4 threads that halves 
the full index update time).
   
   Using very large numbers may make things worse, as time may be spent on 
managing/merging silos, so the "sweet spot" is probably HW dependendant.
   
   
   ---
   
   **Remote Links:**
   - [GitHub Pull Request #205
   ](https://github.com/apache/maven-indexer/pull/205)
   - [GitHub Pull Request #255
   ](https://github.com/apache/maven-indexer/pull/255)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@maven.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to