Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1857417012
Here is the even more interesting stuff. After all those allocation optimizations. I also implemented the on-paper more "efficient" algorithm to intersect FST and FSA for Terms.intersect(), which leverages the sorted nature of the FST arcs and FSA transitions from a given state (so at least we can binary search to advance with some skipping). FST in some cases have direct addressing which is exploited, too. As a side note -- it also uncovered a bug for the NFA impl which is tracked here https://github.com/apache/lucene/issues/12906. But that change is not moving the needle at all for `WildCard` and `Prefix3` search tasks. ``` Before TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Wildcard 62.85 (1.8%) 12.66 (0.6%) -79.9% ( -80% - -78%) 0.000 Fuzzy2 55.12 (1.0%) 18.92 (0.8%) -65.7% ( -66% - -64%) 0.000 Fuzzy1 61.20 (0.8%) 22.55 (0.8%) -63.2% ( -64% - -62%) 0.000 Respell 31.11 (0.8%) 11.95 (0.6%) -61.6% ( -62% - -60%) 0.000 Prefix3 135.69 (2.0%) 65.06 (0.7%) -52.1% ( -53% - -50%) 0.000 HighTermTitleSort 119.58 (0.9%) 111.03 (1.7%) -7.2% ( -9% - -4%) 0.000 IntNRQ 22.25 (1.1%) 21.87 (1.5%) -1.7% ( -4% - 0%) 0.000 HighPhrase 25.82 (3.6%) 25.55 (3.2%) -1.1% ( -7% - 5%) 0.318 MedPhrase 7.41 (2.4%) 7.35 (2.2%) -0.8% ( -5% - 3%) 0.259 LowSpanNear 8.81 (1.9%) 8.74 (2.1%) -0.8% ( -4% - 3%) 0.202 OrHighMedDayTaxoFacets 3.86 (5.8%) 3.83 (4.8%) -0.8% ( -10% - 10%) 0.636 TermDTSort 100.75 (2.9%) 99.98 (2.0%) -0.8% ( -5% - 4%) 0.336 HighIntervalsOrdered 6.07 (2.1%) 6.03 (2.4%) -0.7% ( -5% - 3%) 0.342 MedIntervalsOrdered 45.89 (2.0%) 45.61 (2.4%) -0.6% ( -4% - 3%) 0.389 HighSpanNear 10.73 (1.0%) 10.66 (1.5%) -0.6% ( -3% - 2%) 0.165 HighTermDayOfYearSort 206.09 (1.8%) 204.93 (1.9%) -0.6% ( -4% - 3%) 0.338 LowIntervalsOrdered 8.39 (2.3%) 8.37 (2.5%) -0.3% ( -5% - 4%) 0.654 MedSpanNear 66.00 (1.3%) 65.81 (1.9%) -0.3% ( -3% - 2%) 0.574 MedTerm 322.61 (4.7%) 321.89 (4.5%) -0.2% ( -9% - 9%) 0.878 AndHighMedDayTaxoFacets 22.62 (1.0%) 22.58 (1.2%) -0.2% ( -2% - 2%) 0.617 LowPhrase 48.52 (1.3%) 48.46 (1.4%) -0.1% ( -2% - 2%) 0.745 LowTerm 403.54 (2.9%) 403.22 (2.4%) -0.1% ( -5% - 5%) 0.923 BrowseRandomLabelTaxoFacets 3.20 (0.7%) 3.20 (0.9%) -0.0% ( -1% - 1%) 0.905 AndHighHighDayTaxoFacets 8.06 (1.3%) 8.06 (1.6%) 0.0% ( -2% - 2%) 0.962 BrowseDayOfYearTaxoFacets 3.76 (0.6%) 3.76 (0.6%) 0.0% ( -1% - 1%) 0.859 BrowseMonthTaxoFacets 3.62 (1.0%) 3.62 (1.0%) 0.1% ( -1% - 2%) 0.866 OrHighNotHigh 156.16 (6.0%) 156.26 (5.9%) 0.1% ( -11% - 12%) 0.972 BrowseDateTaxoFacets 3.73 (0.6%) 3.73 (0.6%) 0.1% ( -1% - 1%) 0.722 OrNotHighHigh 144.55 (5.1%) 144.68 (4.7%) 0.1% ( -9% - 10%) 0.957 MedTermDayTaxoFacets 17.57 (2.8%) 17.59 (2.8%) 0.2% ( -5% - 5%) 0.863 BrowseDayOfYearSSDVFacets 3.22 (0.9%) 3.23 (0.7%) 0.2% ( -1% - 1%) 0.424 HighTerm 401.13 (5.4%) 402.30 (5.7%) 0.3% ( -10% - 12%) 0.868 BrowseMonthSSDVFacets 3.27 (0.7%) 3.28 (1.0%) 0.3% ( -1% - 2%) 0.202 AndHighHigh 20.87 (2.8%) 20.94 (2.5%) 0.4% ( -4% - 5%) 0.670 HighSloppyPhrase 6.58 (3.3%) 6.61 (3.4%) 0.4% ( -6% - 7%) 0.727 AndHighMed 89.34 (1.8%) 89.76 (1.4%) 0.5% ( -2% - 3%) 0.355 BrowseDateSSDVFacets 0.95 (2.8%) 0.95 (4.0%) 0.5% ( -6% - 7%) 0.656 OrNotHighLow 420.17 (2.2%) 422.34 (2.2%) 0.5% ( -3% - 4%) 0.452 LowSloppyPhrase 2.89 (2.4%) 2.91 (1.9%) 0.6% ( -3% - 4%) 0.369 OrNotHighMed 219.50 (4.5%) 221.02 (4.1%) 0.7% ( -7% - 9%) 0.611 MedSloppyPhrase 10.44 (2.2%) 10.52 (1.4%) 0.7% ( -2% - 4%) 0.222 OrHighNotLow 288.48 (5.4%) 290.70 (5.7%) 0.8% ( -9% - 12%) 0.663 OrHighMed 53.25 (3.6%) 53.66 (3.5%) 0.8% ( -6% - 8%) 0.488 HighTermTitleBDVSort 2.77 (7.2%) 2.79 (7.0%) 0.9% ( -12% - 16%) 0.699 OrHighNotMed 270.38 (5.7%) 272.88 (5.4%) 0.9% ( -9% - 12%) 0.601 OrHighHigh 20.86 (5.1%) 21.08 (5.2%) 1.1% ( -8% - 12%) 0.519 OrHighLow 220.40 (4.2%) 223.52 (5.6%) 1.4% ( -8% - 11%) 0.367 BrowseRandomLabelSSDVFacets 2.32 (4.3%) 2.38 (7.6%) 2.3% ( -9% - 14%) 0.240 AndHighLow 395.82 (2.9%) 405.36 (2.8%) 2.4% ( -3% - 8%) 0.008 HighTermMonthSort 2375.71 (3.7%) 2555.72 (5.0%) 7.6% ( -1% - 16%) 0.000 PKLookup 140.60 (1.8%) 178.02 (1.3%) 26.6% ( 23% - 30%) 0.000 After TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Wildcard 37.77 (2.7%) 5.70 (1.3%) -84.9% ( -86% - -83%) 0.000 Prefix3 52.27 (2.7%) 22.71 (1.8%) -56.6% ( -59% - -53%) 0.000 Fuzzy1 59.96 (1.6%) 54.80 (2.3%) -8.6% ( -12% - -4%) 0.000 HighTermTitleSort 106.17 (2.1%) 101.30 (1.5%) -4.6% ( -7% - -1%) 0.000 Fuzzy2 33.40 (1.3%) 32.14 (1.6%) -3.8% ( -6% - 0%) 0.000 MedTerm 273.84 (5.4%) 265.92 (9.7%) -2.9% ( -17% - 12%) 0.245 HighTerm 349.66 (5.2%) 341.63 (8.9%) -2.3% ( -15% - 12%) 0.320 LowTerm 356.23 (3.1%) 350.27 (4.3%) -1.7% ( -8% - 5%) 0.156 LowSloppyPhrase 4.47 (2.1%) 4.44 (4.6%) -0.8% ( -7% - 6%) 0.492 HighSpanNear 8.12 (2.1%) 8.06 (2.5%) -0.7% ( -5% - 4%) 0.331 MedSloppyPhrase 31.05 (3.6%) 30.83 (4.1%) -0.7% ( -8% - 7%) 0.559 MedIntervalsOrdered 3.88 (3.3%) 3.86 (3.3%) -0.7% ( -7% - 6%) 0.519 MedSpanNear 8.94 (1.2%) 8.88 (1.6%) -0.7% ( -3% - 2%) 0.126 LowIntervalsOrdered 7.40 (3.3%) 7.35 (3.4%) -0.7% ( -7% - 6%) 0.537 LowSpanNear 29.33 (2.0%) 29.15 (2.2%) -0.6% ( -4% - 3%) 0.374 HighSloppyPhrase 6.68 (3.6%) 6.64 (3.5%) -0.5% ( -7% - 6%) 0.624 MedTermDayTaxoFacets 9.14 (3.0%) 9.11 (8.2%) -0.3% ( -11% - 11%) 0.861 HighPhrase 115.62 (3.9%) 115.24 (4.0%) -0.3% ( -7% - 7%) 0.798 [162/1927] AndHighHigh 13.95 (4.2%) 13.92 (4.5%) -0.3% ( -8% - 8%) 0.847 BrowseMonthSSDVFacets 3.30 (0.8%) 3.29 (1.0%) -0.3% ( -2% - 1%) 0.377 AndHighMed 85.40 (2.0%) 85.18 (2.1%) -0.2% ( -4% - 3%) 0.695 IntNRQ 16.65 (4.1%) 16.63 (3.8%) -0.1% ( -7% - 8%) 0.914 BrowseRandomLabelSSDVFacets 2.30 (0.9%) 2.30 (1.1%) -0.1% ( -2% - 1%) 0.754 OrHighHigh 24.99 (6.1%) 24.97 (5.3%) -0.1% ( -10% - 12%) 0.957 AndHighHighDayTaxoFacets 2.29 (2.8%) 2.29 (2.4%) -0.0% ( -5% - 5%) 0.977 AndHighMedDayTaxoFacets 40.17 (1.4%) 40.20 (1.4%) 0.1% ( -2% - 2%) 0.872 OrHighMedDayTaxoFacets 3.15 (3.9%) 3.15 (3.4%) 0.1% ( -6% - 7%) 0.946 LowPhrase 30.23 (2.5%) 30.26 (2.4%) 0.1% ( -4% - 5%) 0.911 OrNotHighLow 201.78 (2.9%) 202.03 (3.0%) 0.1% ( -5% - 6%) 0.896 BrowseRandomLabelTaxoFacets 3.20 (3.2%) 3.21 (4.1%) 0.1% ( -6% - 7%) 0.899 HighIntervalsOrdered 0.42 (1.9%) 0.42 (1.6%) 0.2% ( -3% - 3%) 0.699 OrNotHighMed 235.49 (5.5%) 236.01 (4.7%) 0.2% ( -9% - 11%) 0.892 BrowseMonthTaxoFacets 3.62 (1.0%) 3.63 (1.0%) 0.2% ( -1% - 2%) 0.477 OrNotHighHigh 329.77 (4.9%) 330.79 (5.5%) 0.3% ( -9% - 11%) 0.851 MedPhrase 35.79 (3.4%) 35.90 (3.4%) 0.3% ( -6% - 7%) 0.771 TermDTSort 112.10 (3.2%) 112.45 (3.4%) 0.3% ( -6% - 7%) 0.763 BrowseDateSSDVFacets 0.97 (7.1%) 0.98 (10.0%) 0.4% ( -15% - 18%) 0.897 BrowseDayOfYearSSDVFacets 3.21 (2.2%) 3.22 (1.6%) 0.4% ( -3% - 4%) 0.525 HighTermDayOfYearSort 235.24 (2.1%) 236.16 (1.6%) 0.4% ( -3% - 4%) 0.512 OrHighMed 70.60 (3.3%) 70.99 (2.7%) 0.5% ( -5% - 6%) 0.571 AndHighLow 370.31 (3.2%) 372.60 (3.4%) 0.6% ( -5% - 7%) 0.559 HighTermTitleBDVSort 5.53 (4.1%) 5.56 (4.5%) 0.6% ( -7% - 9%) 0.648 OrHighNotLow 263.18 (5.6%) 264.95 (6.2%) 0.7% ( -10% - 13%) 0.717 OrHighNotMed 222.41 (5.8%) 224.06 (5.8%) 0.7% ( -10% - 13%) 0.688 OrHighNotHigh 233.04 (5.5%) 234.89 (5.8%) 0.8% ( -9% - 12%) 0.657 OrHighLow 463.17 (3.0%) 466.91 (3.1%) 0.8% ( -5% - 7%) 0.403 BrowseDayOfYearTaxoFacets 3.77 (0.6%) 3.84 (8.9%) 1.9% ( -7% - 11%) 0.342 BrowseDateTaxoFacets 3.74 (0.6%) 3.81 (8.8%) 1.9% ( -7% - 11%) 0.332 HighTermMonthSort 2350.73 (4.0%) 2477.32 (4.5%) 5.4% ( -2% - 14%) 0.000 Respell 30.81 (1.5%) 34.87 (1.7%) 13.2% ( 9% - 16%) 0.000 PKLookup 141.03 (1.9%) 177.54 (2.0%) 25.9% ( 21% - 30%) 0.000 ``` I tried to modify the bench task file and only run `WildCard` to understand where the time is spent. ### My version  ### mainline  So we can see that the most time is spent in actually reading out the FST arcs and FSA transitions... My intuitive explanation for why this is slower than the blocktree is that it has worse locality in its data access pattern. (@mikemccand maybe you can shed some light) Here are some relevant factors: * The FST is larger as it contains all terms. So there are more Arcs to visit. Blocktree (main) use the FST to index prefixes. * When binary-searching or directly address Arc/Transitions the target is somewhat random. * The FST bytes are read backwards. (probably less of an issue if we read sequentially on modern HW) * Blocktree at a given node reads bytes sequentially and terms are sorted, too. Just out of curiosity I altered my code to load the FST on-heap to compare with the default off-heap option. It did not help much with `Wildcard` but PKLookup got substantially faster! The PKLookup task is a great proxy to FST performance, as both versions of the code visits the exact same number of Arcs. ``` Off heap Wildcard 47.56 (1.7%) 10.13 (0.4%) -78.7% ( -79% - -77%) 0.000 PKLookup 136.03 (2.4%) 147.93 (2.3%) 8.8% ( 3% - 13%) 0.000 on heap Wildcard 37.11 (1.5%) 8.35 (0.3%) -77.5% ( -78% - -76%) 0.000 PKLookup 136.04 (3.3%) 269.60 (9.0%) 98.2% ( 83% - 114%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org