jpountz opened a new pull request, #13585: URL: https://github.com/apache/lucene/pull/13585
This updates the postings format in order to inline skip data into postings. This format is generally similar to the current `Lucene99PostingsFormat`, e.g. it shares the same block encoding logic, but it has a few differences: - Skip data is inlined into postings to make the access pattern more sequential. - There are only 2 levels of skip data: on every block (128 docs) and every 32 blocks (4096 docs). In general, I found that the fact that skip data is inlined may slow down a bit queries that don't need skip data at all (e.g. `CountOrXXX` tasks that never advance of consult impacts) and speed up a bit queries that advance by small intervals. The fact that the greatest level only allows skipping 4096 docs at once means that we're slower at advancing by large intervals, but data suggests that it doesn't significantly hurt performance. Phrase queries and term queries sorted by field are slower for reasons that I haven't understood yet. These results were produced in wikibigall without inter-segment concurrency. ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTermTitleSort 152.82 (1.3%) 105.67 (0.9%) -30.9% ( -32% - -29%) 0.000 Phrase 11.67 (5.2%) 10.13 (4.2%) -13.2% ( -21% - -4%) 0.000 CountOrHighHigh 56.79 (33.3%) 49.41 (21.1%) -13.0% ( -50% - 62%) 0.141 HighTermMonthSort 3598.70 (3.2%) 3372.04 (2.7%) -6.3% ( -11% - 0%) 0.000 CountOrHighMed 104.44 (21.2%) 99.90 (18.1%) -4.3% ( -36% - 44%) 0.486 Wildcard 54.26 (3.0%) 52.23 (2.6%) -3.7% ( -9% - 1%) 0.000 TermDTSort 349.67 (6.0%) 339.57 (4.3%) -2.9% ( -12% - 7%) 0.081 IntNRQ 113.09 (21.2%) 110.12 (21.6%) -2.6% ( -37% - 51%) 0.699 CountTerm 9104.21 (4.1%) 8870.31 (6.0%) -2.6% ( -12% - 7%) 0.115 Prefix3 296.80 (1.9%) 290.04 (2.0%) -2.3% ( -6% - 1%) 0.000 HighTerm 383.13 (5.2%) 377.50 (7.5%) -1.5% ( -13% - 11%) 0.472 PKLookup 286.07 (1.5%) 281.91 (2.1%) -1.5% ( -4% - 2%) 0.012 HighTermDayOfYearSort 758.57 (2.6%) 748.44 (2.9%) -1.3% ( -6% - 4%) 0.121 HighTermTitleBDVSort 13.27 (4.9%) 13.13 (6.2%) -1.1% ( -11% - 10%) 0.546 Fuzzy1 98.52 (1.7%) 97.67 (2.1%) -0.9% ( -4% - 3%) 0.154 AndHighHigh 62.93 (1.9%) 62.46 (1.5%) -0.7% ( -4% - 2%) 0.164 Fuzzy2 62.42 (1.5%) 61.96 (1.9%) -0.7% ( -4% - 2%) 0.184 Respell 49.68 (1.3%) 49.39 (1.5%) -0.6% ( -3% - 2%) 0.171 Or2Terms2StopWords 157.28 (1.7%) 157.04 (1.7%) -0.2% ( -3% - 3%) 0.777 OrHighHigh 72.02 (1.7%) 72.21 (1.8%) 0.3% ( -3% - 3%) 0.642 AndStopWords 29.81 (2.2%) 29.94 (1.7%) 0.4% ( -3% - 4%) 0.495 And2Terms2StopWords 151.81 (1.5%) 152.86 (1.8%) 0.7% ( -2% - 4%) 0.183 OrHighNotLow 384.08 (5.0%) 388.68 (6.9%) 1.2% ( -10% - 13%) 0.531 OrHighNotHigh 210.18 (6.1%) 213.18 (7.3%) 1.4% ( -11% - 15%) 0.502 OrHighNotMed 324.28 (5.3%) 329.41 (6.8%) 1.6% ( -9% - 14%) 0.413 MedTerm 567.00 (5.4%) 578.90 (8.1%) 2.1% ( -10% - 16%) 0.333 CountPhrase 3.24 (10.3%) 3.31 (13.2%) 2.2% ( -19% - 28%) 0.551 LowTerm 854.03 (4.9%) 873.32 (7.2%) 2.3% ( -9% - 15%) 0.248 AndHighMed 197.59 (1.5%) 203.05 (2.2%) 2.8% ( 0% - 6%) 0.000 OrNotHighHigh 178.76 (6.5%) 184.38 (7.5%) 3.1% ( -10% - 18%) 0.156 OrStopWords 32.36 (2.8%) 33.56 (1.7%) 3.7% ( 0% - 8%) 0.000 Or3Terms 158.54 (1.6%) 164.51 (2.1%) 3.8% ( 0% - 7%) 0.000 OrHighMed 231.23 (1.8%) 241.40 (2.9%) 4.4% ( 0% - 9%) 0.000 And3Terms 157.12 (1.3%) 164.32 (1.5%) 4.6% ( 1% - 7%) 0.000 OrHighLow 732.71 (1.6%) 786.67 (3.1%) 7.4% ( 2% - 12%) 0.000 OrNotHighMed 282.64 (6.5%) 306.83 (8.5%) 8.6% ( -6% - 25%) 0.000 OrHighRare 237.87 (7.8%) 259.37 (4.6%) 9.0% ( -3% - 23%) 0.000 OrNotHighLow 833.05 (2.4%) 946.10 (3.8%) 13.6% ( 7% - 20%) 0.000 CountAndHighHigh 41.24 (2.0%) 46.91 (2.7%) 13.8% ( 8% - 18%) 0.000 AndHighLow 748.77 (1.7%) 870.25 (3.1%) 16.2% ( 11% - 21%) 0.000 CountAndHighMed 120.32 (2.0%) 140.26 (3.5%) 16.6% ( 10% - 22%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org