[ https://issues.apache.org/jira/browse/LUCENE-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044309#comment-17044309 ]
Bruno Roustant commented on LUCENE-9237: ---------------------------------------- I measured the term dictionary size on disk (wikimediumall): For Lucene84 it takes 30.6 MB of tip files (sum of multiple segment files) For UniformSplit it takes 19.5 MB of ustd files (sum of multiple segment files) (+ 6.1 MB of lucene84 tip which are for facets?) I suppose I should discount 6.1 MB for facets for Lucene84, which gives 30.6-6.1 = 24.5 MB So in my benchmark UniformSplit has a smaller term dictionary (expected -20%). I'll do another benchmark with a block size of 26 terms for UniformSplit (instead of 32), which should give us same term dictionary size (it is quite linear). And I'll force FST-on-heap for Lucene84. > Faster TermsEnum intersect for UniformSplit > ------------------------------------------- > > Key: LUCENE-9237 > URL: https://issues.apache.org/jira/browse/LUCENE-9237 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Bruno Roustant > Assignee: Bruno Roustant > Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > New version of TermsEnum intersect for UniformSplit. It is 75% more efficient > than the previous version for FuzzyQuery. > Compared to BlockTree IntersectTermsEnum: > - It is still slower for FuzzyQuery (-37%) but it is faster than the > previous version (which was -65%). > - It is slightly slower for WildcardQuery (-5%). > - It is slightly faster for PrefixQuery (+5%). Sometimes benchmarks show > more improvement (I've seen up to +17% a fourth of the time). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org