[ 
https://issues.apache.org/jira/browse/LUCENE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271961#comment-17271961
 ] 

Dawid Weiss commented on LUCENE-9694:
-------------------------------------

But this "index map" is not part of the pull request, right? I thought it'd 
some form of serialized state which you could then reproduce when rebuilding 
the index (segments with their document "identifiers", perhaps)?

> New tool for creating a deterministic index
> -------------------------------------------
>
>                 Key: LUCENE-9694
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9694
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: general/tools
>            Reporter: Haoyu Zhai
>            Priority: Minor
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Lucene's index is segmented, and sometimes number of segments and documents 
> arrangement greatly impact performance.
> Given a stable index sort, our team create a tool that records document 
> arrangement (called index map) of an index and rearrange another index 
> (consists of same documents) into the same structure (segment num, and 
> documents included in each segment).
> This tool could be also used in lucene benchmarks for a faster deterministic 
> index construction (if I understand correctly lucene benchmark is using a 
> single thread manner to achieve this).
>  
> We've already had some discussion in email
> [https://markmail.org/message/lbtdntclpnocmfuf]
> And I've implemented the first method, using {{IndexWriter.addIndexes}} and a 
> customized {{FilteredCodecReader}} to achieve the goal. The index 
> construction time is about 25min and time executing this tool is about 10min.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to