[ https://issues.apache.org/jira/browse/LUCENE-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17195912#comment-17195912 ]
Adrien Grand commented on LUCENE-9510: -------------------------------------- And I just opened a follow-up PR that uses a format that doesn't compress data for temporary stored fields and term vectors that we write on the fly before they get sorted upon flushing. On a synthetic benchmark that doesn't index anything in order to make sure stored fields are the bottleneck, this resulted in a 3x indexing speedup. > SortingStoredFieldsConsumer should use a format that has better random-access > ----------------------------------------------------------------------------- > > Key: LUCENE-9510 > URL: https://issues.apache.org/jira/browse/LUCENE-9510 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > We noticed some indexing rate regressions in Elasticsearch after upgrading to > a new Lucene snapshot. This is due to the fact that > SortingStoredFieldsConsumer is using the default codec to write stored fields > on flush. Compression doesn't matter much for this case since these are > temporary files that get removed on flush after the segment is sorted anyway > so we could switch to a format that has faster random access. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org