[ https://issues.apache.org/jira/browse/LUCENE-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271961#comment-17271961 ]
Dawid Weiss commented on LUCENE-9694: ------------------------------------- But this "index map" is not part of the pull request, right? I thought it'd some form of serialized state which you could then reproduce when rebuilding the index (segments with their document "identifiers", perhaps)? > New tool for creating a deterministic index > ------------------------------------------- > > Key: LUCENE-9694 > URL: https://issues.apache.org/jira/browse/LUCENE-9694 > Project: Lucene - Core > Issue Type: New Feature > Components: general/tools > Reporter: Haoyu Zhai > Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Lucene's index is segmented, and sometimes number of segments and documents > arrangement greatly impact performance. > Given a stable index sort, our team create a tool that records document > arrangement (called index map) of an index and rearrange another index > (consists of same documents) into the same structure (segment num, and > documents included in each segment). > This tool could be also used in lucene benchmarks for a faster deterministic > index construction (if I understand correctly lucene benchmark is using a > single thread manner to achieve this). > > We've already had some discussion in email > [https://markmail.org/message/lbtdntclpnocmfuf] > And I've implemented the first method, using {{IndexWriter.addIndexes}} and a > customized {{FilteredCodecReader}} to achieve the goal. The index > construction time is about 25min and time executing this tool is about 10min. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org