[jira] [Updated] (MINDEXER-151) Speed up Index update from remote

Jira Thu, 28 Apr 2022 03:39:07 -0700


     [ 
https://issues.apache.org/jira/browse/MINDEXER-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tamás Cservenák updated MINDEXER-151:
-------------------------------------
    Description: 
Currently, if you execute from examples the BasicUsageExample, it will perform 
"full" update, and the full update (to get from "empty" index to "up to date" 
index) takes 15 or more minutes. Yes, Central index is huge, but there is room 
for improvement.

Steps happening during update(s):
 * properties file downloaded
 * GZ file(s) downloaded (depending is it incremental or full)
 * the GZ files are processed into temporary Lucene index
 * the target (being updated) indexing context index is "replaced" (or merged, 
depends) with temporary Lucene index

Downloading files are several seconds, but it is the processing of the GZIP raw 
records into Lucene index that takes long time. This can be improved.

The work done here is somewhat interleaved with MINDEXER-150 as well, as the 
duplicate code (incremental download handling, consuming the downloaded GZ 
files) should be reused, not duplicated across modules.

  was:
Currently, if you execute from examples the BasicUsageExample, it will perform 
"full" update, and the full update (to get from "empty" index to "up to date" 
index) takes 15 or more minutes. Yes, Central index is huge, but there is room 
for improvement.

Steps happening during update(s):
 * properties file downloaded
 * GZ file(s) downloaded (depending is it incremental or full)
 * the GZ files are processed into temporary Lucene index
 * the target (being updated) indexing context index is "replaced" (or merged, 
depends) with temporary Lucene index

Downloading files are several seconds, but it is the processing of the GZIP raw 
records into Lucene index that takes long time. This can be improved.


> Speed up Index update from remote
> ---------------------------------
>
>                 Key: MINDEXER-151
>                 URL: https://issues.apache.org/jira/browse/MINDEXER-151
>             Project: Maven Indexer
>          Issue Type: Improvement
>            Reporter: Tamás Cservenák
>            Priority: Major
>
> Currently, if you execute from examples the BasicUsageExample, it will 
> perform "full" update, and the full update (to get from "empty" index to "up 
> to date" index) takes 15 or more minutes. Yes, Central index is huge, but 
> there is room for improvement.
> Steps happening during update(s):
>  * properties file downloaded
>  * GZ file(s) downloaded (depending is it incremental or full)
>  * the GZ files are processed into temporary Lucene index
>  * the target (being updated) indexing context index is "replaced" (or 
> merged, depends) with temporary Lucene index
> Downloading files are several seconds, but it is the processing of the GZIP 
> raw records into Lucene index that takes long time. This can be improved.
> The work done here is somewhat interleaved with MINDEXER-150 as well, as the 
> duplicate code (incremental download handling, consuming the downloaded GZ 
> files) should be reused, not duplicated across modules.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (MINDEXER-151) Speed up Index update from remote

Reply via email to