jpountz commented on PR #13585: URL: https://github.com/apache/lucene/pull/13585#issuecomment-2246112137
Skip data at level 0 now stores pointers into pos/pay files instead of incrementing posPendingCount by the total term freq of the block. This seems to slow down term queries marginally and improve phrase queries a bit. Also I noticed we would sometimes decode the same block of positions multiple times when it's shared by two doc blocks (because when moving to the next doc block we reset the position FP to the start of the pos block and decode them again, while they were already decoded, it looks like it's an existing issue in Lucene99 too), but fixing it only yielded a minor speedup. luceneutil now gives this on wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value CountOrHighHigh 54.55 (28.7%) 48.60 (18.8%) -10.9% ( -45% - 51%) 0.155 HighTermMonthSort 3730.94 (3.2%) 3388.64 (2.2%) -9.2% ( -14% - -3%) 0.000 Prefix3 35.43 (5.2%) 34.45 (3.3%) -2.8% ( -10% - 6%) 0.044 CountOrHighMed 102.26 (20.1%) 99.71 (16.3%) -2.5% ( -32% - 42%) 0.666 Wildcard 106.00 (4.9%) 103.94 (3.7%) -1.9% ( -9% - 6%) 0.154 OrHighNotLow 345.58 (4.7%) 339.60 (6.2%) -1.7% ( -12% - 9%) 0.321 HighTermTitleSort 132.39 (6.1%) 130.53 (2.2%) -1.4% ( -9% - 7%) 0.329 AndHighHigh 62.75 (1.7%) 62.14 (1.8%) -1.0% ( -4% - 2%) 0.078 CountTerm 9407.38 (4.8%) 9317.61 (3.1%) -1.0% ( -8% - 7%) 0.454 TermDTSort 342.50 (3.8%) 339.94 (4.1%) -0.7% ( -8% - 7%) 0.548 PKLookup 289.43 (1.8%) 287.33 (2.8%) -0.7% ( -5% - 3%) 0.330 Respell 50.56 (1.8%) 50.26 (2.1%) -0.6% ( -4% - 3%) 0.339 Fuzzy2 70.03 (1.5%) 69.67 (1.5%) -0.5% ( -3% - 2%) 0.271 HighTermDayOfYearSort 801.40 (2.3%) 797.96 (2.5%) -0.4% ( -5% - 4%) 0.570 HighTerm 442.74 (5.8%) 441.35 (6.1%) -0.3% ( -11% - 12%) 0.867 Fuzzy1 74.12 (1.6%) 74.28 (1.6%) 0.2% ( -2% - 3%) 0.665 HighTermTitleBDVSort 14.58 (7.7%) 14.61 (7.2%) 0.3% ( -13% - 16%) 0.914 AndStopWords 29.90 (2.0%) 29.99 (1.8%) 0.3% ( -3% - 4%) 0.626 Or2Terms2StopWords 159.18 (1.4%) 160.08 (1.2%) 0.6% ( -1% - 3%) 0.159 OrHighHigh 66.89 (1.7%) 67.38 (1.6%) 0.7% ( -2% - 4%) 0.155 And2Terms2StopWords 152.71 (1.5%) 154.79 (1.3%) 1.4% ( -1% - 4%) 0.002 OrHighNotMed 322.27 (5.3%) 327.40 (7.3%) 1.6% ( -10% - 15%) 0.432 MedTerm 560.59 (6.6%) 571.19 (7.5%) 1.9% ( -11% - 17%) 0.399 OrHighNotHigh 233.42 (6.3%) 239.76 (7.7%) 2.7% ( -10% - 17%) 0.223 IntNRQ 140.44 (18.3%) 144.57 (18.8%) 2.9% ( -28% - 49%) 0.616 AndHighMed 151.63 (1.5%) 156.15 (1.5%) 3.0% ( 0% - 6%) 0.000 OrStopWords 32.73 (1.9%) 33.84 (1.6%) 3.4% ( 0% - 7%) 0.000 LowTerm 972.10 (5.9%) 1005.36 (6.5%) 3.4% ( -8% - 16%) 0.081 Phrase 11.68 (4.5%) 12.08 (4.1%) 3.4% ( -4% - 12%) 0.012 OrHighMed 199.56 (1.8%) 207.12 (2.1%) 3.8% ( 0% - 7%) 0.000 OrNotHighHigh 214.79 (6.5%) 223.81 (8.2%) 4.2% ( -9% - 20%) 0.073 Or3Terms 159.96 (1.5%) 167.15 (1.4%) 4.5% ( 1% - 7%) 0.000 And3Terms 157.48 (1.6%) 165.21 (1.5%) 4.9% ( 1% - 8%) 0.000 CountPhrase 3.33 (11.8%) 3.56 (13.9%) 6.7% ( -16% - 36%) 0.100 OrHighLow 695.89 (2.2%) 743.44 (2.7%) 6.8% ( 1% - 12%) 0.000 OrHighRare 242.91 (4.0%) 263.09 (4.2%) 8.3% ( 0% - 17%) 0.000 OrNotHighMed 258.59 (6.9%) 285.71 (9.6%) 10.5% ( -5% - 28%) 0.000 CountAndHighHigh 41.68 (2.3%) 47.42 (2.9%) 13.8% ( 8% - 19%) 0.000 AndHighLow 913.56 (2.2%) 1063.18 (2.3%) 16.4% ( 11% - 21%) 0.000 OrNotHighLow 843.74 (1.7%) 982.67 (3.5%) 16.5% ( 11% - 22%) 0.000 CountAndHighMed 121.18 (2.0%) 142.55 (3.1%) 17.6% ( 12% - 23%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org