[ 
https://issues.apache.org/jira/browse/LUCENE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325030#comment-17325030
 ] 

Michael McCandless commented on LUCENE-9694:
--------------------------------------------

[~zhai7631] also fixed our {{luceneutil}} and our nightly benchmarks to use 
this new {{IndexRearranger}} class to create a deterministic index, instead of 
the crazy slow single-threaded, {{SerialMergeScheduler}} it was doing before, 
and it resulted in a massive speedup (~4.3X) in the "build the deterministic 
index for searching" step: 
[https://github.com/mikemccand/luceneutil/issues/117#issuecomment-822463187]

Thank you [~zhai7631]!

> New tool for creating a deterministic index
> -------------------------------------------
>
>                 Key: LUCENE-9694
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9694
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: general/tools
>            Reporter: Haoyu Zhai
>            Priority: Minor
>             Fix For: main (9.0), 8.9
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Lucene's index is segmented, and sometimes number of segments and documents 
> arrangement greatly impact performance.
> Given a stable index sort, our team create a tool that records document 
> arrangement (called index map) of an index and rearrange another index 
> (consists of same documents) into the same structure (segment num, and 
> documents included in each segment).
> This tool could be also used in lucene benchmarks for a faster deterministic 
> index construction (if I understand correctly lucene benchmark is using a 
> single thread manner to achieve this).
>  
> We've already had some discussion in email
> [https://markmail.org/message/lbtdntclpnocmfuf]
> And I've implemented the first method, using {{IndexWriter.addIndexes}} and a 
> customized {{FilteredCodecReader}} to achieve the goal. The index 
> construction time is about 25min and time executing this tool is about 10min.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to