[ 
https://issues.apache.org/jira/browse/LUCENE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275312#comment-17275312
 ] 

ASF subversion and git services commented on LUCENE-9694:
---------------------------------------------------------

Commit 6f913a2bc7cdac73056290b8a8212c53ec9b4561 in lucene-solr's branch 
refs/heads/branch_8x from Michael McCandless
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6f913a2 ]

LUCENE-9694: fix other precommit issues, including unused import, but most 
importantly that DocumentSelector was accidentally package private, making it 
difficult to use


> New tool for creating a deterministic index
> -------------------------------------------
>
>                 Key: LUCENE-9694
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9694
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: general/tools
>            Reporter: Haoyu Zhai
>            Priority: Minor
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Lucene's index is segmented, and sometimes number of segments and documents 
> arrangement greatly impact performance.
> Given a stable index sort, our team create a tool that records document 
> arrangement (called index map) of an index and rearrange another index 
> (consists of same documents) into the same structure (segment num, and 
> documents included in each segment).
> This tool could be also used in lucene benchmarks for a faster deterministic 
> index construction (if I understand correctly lucene benchmark is using a 
> single thread manner to achieve this).
>  
> We've already had some discussion in email
> [https://markmail.org/message/lbtdntclpnocmfuf]
> And I've implemented the first method, using {{IndexWriter.addIndexes}} and a 
> customized {{FilteredCodecReader}} to achieve the goal. The index 
> construction time is about 25min and time executing this tool is about 10min.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to