original-brownbear commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2176505152
I reran the benchmarks of no concurrency vs 4 threads and constrained the page cache a lot by setting -Xmx to almost all of the machines memory (page cache size goes to about 1.5G). As somewhat expected, in this scenario concurrency helps a lot. ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value PKLookup 85.93 (34.3%) 3.99 (1.4%) -95.4% ( -97% - -90%) 0.000 Fuzzy1 101.17 (2.9%) 8.18 (8.3%) -91.9% (-100% - -83%) 0.000 AndHighLow 1044.32 (2.5%) 85.54 (0.2%) -91.8% ( -92% - -91%) 0.000 Fuzzy2 47.18 (16.5%) 4.91 (0.9%) -89.6% ( -91% - -86%) 0.000 LowSloppyPhrase 45.24 (3.3%) 4.85 (0.3%) -89.3% ( -89% - -88%) 0.000 MedPhrase 40.91 (1.6%) 5.12 (2.2%) -87.5% ( -89% - -85%) 0.000 HighPhrase 87.72 (1.4%) 11.00 (0.1%) -87.5% ( -87% - -87%) 0.000 HighTermTitleBDVSort 13.88 (1.6%) 1.78 (0.1%) -87.2% ( -87% - -86%) 0.000 OrNotHighLow 1688.98 (6.8%) 221.97 (15.1%) -86.9% (-101% - -69%) 0.000 Prefix3 557.04 (17.4%) 73.35 (2.6%) -86.8% ( -90% - -80%) 0.000 Wildcard 30.67 (7.3%) 4.55 (3.8%) -85.2% ( -89% - -79%) 0.000 Respell 32.29 (40.9%) 4.90 (14.7%) -84.8% ( -99% - -49%) 0.000 LowSpanNear 25.42 (0.8%) 3.98 (2.7%) -84.4% ( -87% - -81%) 0.000 MedSloppyPhrase 57.73 (2.6%) 9.15 (1.3%) -84.2% ( -85% - -82%) 0.000 LowPhrase 169.09 (0.7%) 28.85 (13.2%) -82.9% ( -96% - -69%) 0.000 LowIntervalsOrdered 14.27 (0.8%) 2.44 (3.6%) -82.9% ( -86% - -79%) 0.000 IntNRQ 97.33 (2.6%) 18.62 (0.9%) -80.9% ( -82% - -79%) 0.000 MedSpanNear 27.71 (0.8%) 5.96 (4.1%) -78.5% ( -82% - -74%) 0.000 MedIntervalsOrdered 5.63 (0.9%) 1.49 (5.1%) -73.6% ( -78% - -68%) 0.000 HighIntervalsOrdered 4.11 (1.0%) 1.09 (0.9%) -73.6% ( -74% - -72%) 0.000 HighSloppyPhrase 16.75 (1.8%) 4.68 (2.3%) -72.1% ( -74% - -69%) 0.000 HighSpanNear 7.44 (1.3%) 2.13 (2.8%) -71.3% ( -74% - -68%) 0.000 OrNotHighMed 278.73 (2.2%) 85.43 (2.8%) -69.4% ( -72% - -65%) 0.000 BrowseDateTaxoFacets 15.30 (2.3%) 4.80 (1.7%) -68.6% ( -71% - -66%) 0.000 BrowseDayOfYearTaxoFacets 14.91 (11.4%) 5.52 (2.4%) -62.9% ( -68% - -55%) 0.000 BrowseRandomLabelTaxoFacets 11.38 (0.5%) 4.28 (1.5%) -62.4% ( -64% - -60%) 0.000 OrHighMed 129.46 (4.2%) 48.97 (8.7%) -62.2% ( -72% - -51%) 0.000 AndHighHigh 65.35 (1.0%) 25.98 (6.2%) -60.2% ( -66% - -53%) 0.000 AndHighMed 76.79 (1.3%) 31.63 (9.5%) -58.8% ( -68% - -48%) 0.000 OrHighHigh 42.48 (14.7%) 18.32 (1.5%) -56.9% ( -63% - -47%) 0.000 OrHighNotLow 144.92 (1.7%) 64.99 (0.8%) -55.2% ( -56% - -53%) 0.000 MedTerm 284.66 (25.9%) 152.10 (3.4%) -46.6% ( -60% - -23%) 0.000 AndHighMedDayTaxoFacets 108.26 (2.1%) 63.19 (2.1%) -41.6% ( -44% - -38%) 0.000 BrowseDayOfYearSSDVFacets 7.91 (5.5%) 4.67 (1.5%) -40.9% ( -45% - -35%) 0.000 OrNotHighHigh 155.25 (9.7%) 93.97 (2.8%) -39.5% ( -47% - -29%) 0.000 OrHighLow 176.36 (55.0%) 123.29 (59.2%) -30.1% ( -93% - 186%) 0.405 BrowseDateSSDVFacets 1.53 (2.5%) 1.27 (8.8%) -17.3% ( -27% - -6%) 0.000 MedTermDayTaxoFacets 21.73 (1.6%) 18.02 (2.8%) -17.0% ( -21% - -12%) 0.000 LowTerm 279.50 (24.2%) 234.75 (14.9%) -16.0% ( -44% - 30%) 0.208 OrHighNotHigh 70.74 (25.6%) 60.08 (54.8%) -15.1% ( -76% - 87%) 0.578 HighTermTitleSort 23.23 (1.5%) 19.88 (16.3%) -14.4% ( -31% - 3%) 0.048 AndHighHighDayTaxoFacets 7.27 (1.3%) 6.80 (4.5%) -6.5% ( -12% - 0%) 0.002 BrowseMonthTaxoFacets 12.85 (24.3%) 12.34 (28.6%) -4.0% ( -45% - 64%) 0.813 BrowseMonthSSDVFacets 5.17 (1.7%) 4.99 (2.2%) -3.4% ( -7% - 0%) 0.007 BrowseRandomLabelSSDVFacets 3.72 (0.5%) 3.81 (14.4%) 2.5% ( -12% - 17%) 0.696 OrHighMedDayTaxoFacets 4.74 (1.7%) 4.89 (4.6%) 3.3% ( -2% - 9%) 0.134 HighTermMonthSort 279.05 (33.6%) 307.51 (140.4%) 10.2% (-122% - 277%) 0.874 OrHighNotMed 120.83 (1.2%) 154.43 (84.5%) 27.8% ( -57% - 114%) 0.462 TermDTSort 59.06 (4.9%) 81.90 (63.8%) 38.7% ( -28% - 113%) 0.177 HighTermDayOfYearSort 55.42 (41.8%) 77.40 (19.2%) 39.7% ( -15% - 173%) 0.054 HighTerm 124.75 (7.0%) 496.22 (33.9%) 297.8% ( 239% - 364%) 0.000 ``` Seems to me CPU bound scenarios don't parallelise well and cache misses just introduce bottle-necks that more than outweigh the benefits of speeding up the parallel sections of a search, but as soon as significant disk IO comes into play the situation reverses as one would expect. Who cares about stalling on RAM reads while stalling on disk frequently I guess? :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org