Tony-X commented on PR #12688: URL: https://github.com/apache/lucene/pull/12688#issuecomment-1857371557
Since the first working version, I iterated with a list of profiling-guided allocation optimizations, as they stood out quite obviously from the merged JFR reports (thanks to luceneutil !). Some of them comes from my code that implements the term dictionary data lookup, and a few of them are at more general Lucene level. I want to highlight the general issue I see from this work and maybe we can have separate issues to improve them! Here is the first heap profile comparison (search-only, no indexing). ``` Candidate Heap 17.50% 24440M java.lang.Long#valueOf() 10.09% 14096M jdk.internal.misc.Unsafe#allocateUninitializedArray() 6.87% 9594M org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#initializeValueCounters() 4.40% 6140M org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD() ... ``` ``` main 13.65% 11898M org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#initializeValueCounters() 9.26% 8071M org.apache.lucene.util.FixedBitSet#<init>() 6.70% 5836M org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD() 6.60% 5751M org.apache.lucene.util.ArrayUtil#growExact() 5.21% 4541M org.apache.lucene.facet.FacetsConfig#stringToPath() 4.69% 4090M org.apache.lucene.util.DocIdSetBuilder$Buffer#<init>() ``` ## FST doesn't play nicely with primitive types (I know, this is more or less a java issue) `24440M java.lang.Long#valueOf()` huge amount of allocations... This is obvious. The FST<T> implementation is generic over its output type and in my case T is `Long`. So for trivial `long` add and subtract, the implementation would allocate an object. Not only it is wasteful but from a perf perspective it'd be less than 1 CPU cycle v.s. calling allocator which is easily tens if not hundreds of cycles. For this work, I forked the FST<T> class and manually templated it with long just to see how much difference it makes. Here is a diff in heap profile and bench results before and after. ``` Before PERCENT HEAP SAMPLES STACK 25.97% 32791M java.lang.Long#valueOf() 7.58% 9571M org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#initializeValueCounters() 5.13% 6482M org.apache.lucene.util.FixedBitSet#<init>() 4.90% .... After PERCENT HEAP SAMPLES STACK 8.44% 7988M org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#initializeValueCounters() 7.17% 6788M org.apache.lucene.util.FixedBitSet#<init>() 6.22% 5886M org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegmentNHLD() 5.89% 5577M org.apache.lucene.util.ArrayUtil#growExact() ``` Bench diff ``` Before TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Wildcard 11.61 (2.7%) 2.40 (0.6%) -79.4% ( -80% - -78%) 0.000 Fuzzy1 78.17 (0.7%) 27.16 (0.9%) -65.3% ( -66% - -64%) 0.000 Respell 29.09 (0.6%) 10.91 (0.8%) -62.5% ( -63% - -61%) 0.000 Fuzzy2 47.80 (0.6%) 18.50 (1.2%) -61.3% ( -62% - -59%) 0.000 Prefix3 765.08 (3.1%) 463.94 (0.9%) -39.4% ( -42% - -36%) 0.000 HighTermTitleSort 98.48 (2.0%) 90.62 (2.2%) -8.0% ( -11% - -3%) 0.000 BrowseMonthTaxoFacets 3.89 (29.2%) 3.62 (0.9%) -6.9% ( -28% - 32%) 0.293 LowSloppyPhrase 22.73 (6.5%) 22.35 (6.9%) -1.7% ( -14% - 12%) 0.432 LowTerm 365.47 (3.6%) 359.62 (2.9%) -1.6% ( -7% - 5%) 0.121 HighTerm 398.57 (5.1%) 393.16 (4.7%) -1.4% ( -10% - 8%) 0.380 MedSloppyPhrase 10.63 (3.6%) 10.51 (3.7%) -1.1% ( -8% - 6%) 0.339 MedTerm 422.73 (4.2%) 418.60 (4.0%) -1.0% ( -8% - 7%) 0.451 MedTermDayTaxoFacets 14.84 (2.6%) 14.71 (2.5%) -0.8% ( -5% - 4%) 0.296 HighSloppyPhrase 12.41 (3.1%) 12.33 (3.1%) -0.7% ( -6% - 5%) 0.487 HighTermTitleBDVSort 6.88 (3.3%) 6.84 (3.5%) -0.6% ( -7% - 6%) 0.599 LowPhrase 58.15 (2.9%) 57.85 (2.8%) -0.5% ( -6% - 5%) 0.567 BrowseDayOfYearSSDVFacets 3.24 (0.4%) 3.23 (0.5%) -0.3% ( -1% - 0%) 0.042 MedPhrase 26.19 (3.1%) 26.11 (3.2%) -0.3% ( -6% - 6%) 0.775 OrNotHighMed 185.23 (3.9%) 184.73 (3.3%) -0.3% ( -7% - 7%) 0.813 OrHighMedDayTaxoFacets 3.82 (3.3%) 3.81 (3.2%) -0.3% ( -6% - 6%) 0.796 OrHighNotLow 194.98 (5.1%) 194.51 (4.6%) -0.2% ( -9% - 10%) 0.875 OrHighNotMed 337.15 (4.4%) 336.53 (3.8%) -0.2% ( -7% - 8%) 0.888 IntNRQ 67.60 (0.9%) 67.55 (1.0%) -0.1% ( -1% - 1%) 0.783 MedSpanNear 9.85 (1.4%) 9.84 (2.1%) -0.1% ( -3% - 3%) 0.906 OrNotHighHigh 205.12 (4.1%) 205.01 (3.9%) -0.1% ( -7% - 8%) 0.967 AndHighHighDayTaxoFacets 6.35 (1.5%) 6.34 (1.7%) -0.0% ( -3% - 3%) 0.932 BrowseMonthSSDVFacets 3.29 (0.8%) 3.29 (0.7%) -0.0% ( -1% - 1%) 0.887 BrowseRandomLabelSSDVFacets 2.30 (0.8%) 2.30 (1.0%) 0.0% ( -1% - 1%) 0.919 LowSpanNear 16.41 (2.6%) 16.42 (2.7%) 0.1% ( -5% - 5%) 0.931 HighPhrase 77.12 (3.0%) 77.20 (3.6%) 0.1% ( -6% - 6%) 0.923 AndHighMedDayTaxoFacets 39.64 (1.2%) 39.68 (1.0%) 0.1% ( -2% - 2%) 0.742 BrowseRandomLabelTaxoFacets 3.19 (1.6%) 3.19 (1.1%) 0.1% ( -2% - 2%) 0.728 BrowseDateTaxoFacets 3.73 (0.7%) 3.74 (0.5%) 0.3% ( 0% - 1%) 0.157 AndHighHigh 27.08 (1.3%) 27.15 (3.0%) 0.3% ( -4% - 4%) 0.718 BrowseDayOfYearTaxoFacets 3.76 (0.6%) 3.77 (0.5%) 0.3% ( 0% - 1%) 0.072 HighTermDayOfYearSort 224.01 (2.1%) 224.81 (2.1%) 0.4% ( -3% - 4%) 0.592 HighSpanNear 6.09 (2.7%) 6.11 (3.1%) 0.4% ( -5% - 6%) 0.683 HighIntervalsOrdered 8.08 (3.3%) 8.11 (3.4%) 0.4% ( -6% - 7%) 0.705 TermDTSort 103.29 (4.4%) 103.83 (3.1%) 0.5% ( -6% - 8%) 0.669 MedIntervalsOrdered 33.12 (4.4%) 33.29 (4.6%) 0.5% ( -8% - 9%) 0.702 LowIntervalsOrdered 10.06 (3.9%) 10.12 (3.6%) 0.6% ( -6% - 8%) 0.609 AndHighMed 73.71 (2.2%) 74.18 (2.5%) 0.6% ( -3% - 5%) 0.394 OrHighMed 71.44 (2.7%) 71.98 (3.3%) 0.7% ( -5% - 6%) 0.429 BrowseDateSSDVFacets 0.96 (4.8%) 0.97 (5.7%) 0.9% ( -9% - 11%) 0.601 OrHighNotHigh 308.82 (4.0%) 311.53 (3.7%) 0.9% ( -6% - 8%) 0.470 OrHighLow 404.69 (3.0%) 408.63 (3.5%) 1.0% ( -5% - 7%) 0.348 OrHighHigh 20.44 (4.7%) 20.73 (7.2%) 1.4% ( -10% - 13%) 0.469 OrNotHighLow 381.28 (1.8%) 388.18 (2.1%) 1.8% ( -2% - 5%) 0.004 HighTermMonthSort 2500.04 (2.2%) 2554.91 (4.3%) 2.2% ( -4% - 8%) 0.042 AndHighLow 668.12 (3.1%) 692.04 (3.9%) 3.6% ( -3% - 10%) 0.001 PKLookup 140.25 (2.0%) 168.53 (1.9%) 20.2% ( 15% - 24%) 0.000 After TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Wildcard 54.96 (2.6%) 10.43 (0.5%) -81.0% ( -82% - -80%) 0.000 Respell 45.54 (1.0%) 16.74 (0.7%) -63.2% ( -64% - -62%) 0.000 Fuzzy1 46.41 (1.2%) 17.26 (1.0%) -62.8% ( -64% - -61%) 0.000 Prefix3 121.65 (2.4%) 55.57 (0.9%) -54.3% ( -56% - -52%) 0.000 Fuzzy2 32.33 (1.2%) 15.79 (1.1%) -51.2% ( -52% - -49%) 0.000 HighTermTitleSort 95.24 (2.1%) 87.04 (1.9%) -8.6% ( -12% - -4%) 0.000 BrowseRandomLabelSSDVFacets 2.37 (7.1%) 2.33 (4.8%) -1.7% ( -12% - 10%) 0.374 BrowseMonthSSDVFacets 3.34 (7.3%) 3.29 (0.6%) -1.5% ( -8% - 6%) 0.362 TermDTSort 120.57 (2.3%) 119.02 (3.4%) -1.3% ( -6% - 4%) 0.163 OrHighHigh 19.13 (5.6%) 18.92 (2.7%) -1.1% ( -8% - 7%) 0.430 AndHighHigh 22.04 (5.1%) 21.87 (3.0%) -0.8% ( -8% - 7%) 0.555 AndHighMed 55.06 (3.0%) 54.79 (2.1%) -0.5% ( -5% - 4%) 0.546 HighSpanNear 3.29 (1.6%) 3.28 (1.7%) -0.5% ( -3% - 2%) 0.346 HighIntervalsOrdered 0.65 (1.8%) 0.65 (2.0%) -0.5% ( -4% - 3%) 0.433 HighTermDayOfYearSort 282.86 (2.0%) 281.57 (2.6%) -0.5% ( -4% - 4%) 0.533 MedIntervalsOrdered 16.36 (1.5%) 16.29 (1.5%) -0.4% ( -3% - 2%) 0.369 OrHighMed 68.27 (2.9%) 67.99 (1.8%) -0.4% ( -5% - 4%) 0.598 MedSpanNear 3.22 (1.0%) 3.21 (1.4%) -0.4% ( -2% - 2%) 0.317 HighSloppyPhrase 9.59 (2.5%) 9.57 (2.6%) -0.3% ( -5% - 4%) 0.733 BrowseMonthTaxoFacets 3.64 (2.4%) 3.63 (1.8%) -0.2% ( -4% - 4%) 0.756 LowIntervalsOrdered 14.66 (0.9%) 14.63 (1.5%) -0.2% ( -2% - 2%) 0.633 MedTermDayTaxoFacets 15.56 (2.7%) 15.54 (3.9%) -0.2% ( -6% - 6%) 0.879 AndHighMedDayTaxoFacets 18.70 (1.4%) 18.67 (3.7%) -0.2% ( -5% - 5%) 0.864 LowSpanNear 4.39 (1.1%) 4.38 (1.4%) -0.1% ( -2% - 2%) 0.728 OrHighMedDayTaxoFacets 5.38 (3.5%) 5.38 (5.4%) -0.1% ( -8% - 9%) 0.945 AndHighHighDayTaxoFacets 7.06 (1.6%) 7.06 (3.0%) -0.1% ( -4% - 4%) 0.924 LowSloppyPhrase 7.16 (1.4%) 7.15 (1.6%) -0.1% ( -2% - 2%) 0.891 MedSloppyPhrase 128.54 (1.9%) 128.56 (2.2%) 0.0% ( -4% - 4%) 0.979 LowTerm 417.80 (3.3%) 418.01 (2.7%) 0.1% ( -5% - 6%) 0.958 LowPhrase 125.59 (4.0%) 125.77 (3.1%) 0.1% ( -6% - 7%) 0.900 OrHighLow 313.22 (2.1%) 313.72 (2.2%) 0.2% ( -4% - 4%) 0.817 BrowseDateTaxoFacets 3.73 (0.7%) 3.74 (0.7%) 0.2% ( -1% - 1%) 0.470 BrowseDayOfYearTaxoFacets 3.76 (0.7%) 3.76 (0.7%) 0.2% ( -1% - 1%) 0.457 MedTerm 384.57 (4.6%) 385.44 (3.6%) 0.2% ( -7% - 8%) 0.863 OrHighNotHigh 255.07 (4.3%) 256.05 (4.3%) 0.4% ( -7% - 9%) 0.778 MedPhrase 11.17 (3.0%) 11.21 (2.6%) 0.4% ( -5% - 6%) 0.658 HighTerm 361.26 (5.1%) 362.86 (4.2%) 0.4% ( -8% - 10%) 0.764 BrowseRandomLabelTaxoFacets 3.19 (1.5%) 3.20 (0.6%) 0.5% ( -1% - 2%) 0.203 OrNotHighHigh 205.38 (4.0%) 206.35 (4.0%) 0.5% ( -7% - 8%) 0.712 OrNotHighLow 317.96 (1.7%) 319.48 (2.1%) 0.5% ( -3% - 4%) 0.428 HighPhrase 47.91 (3.8%) 48.15 (3.3%) 0.5% ( -6% - 7%) 0.661 BrowseDateSSDVFacets 0.97 (6.9%) 0.98 (6.7%) 0.5% ( -12% - 15%) 0.801 OrHighNotLow 185.96 (4.9%) 187.04 (5.0%) 0.6% ( -8% - 11%) 0.710 BrowseDayOfYearSSDVFacets 3.21 (2.1%) 3.23 (0.9%) 0.6% ( -2% - 3%) 0.225 HighTermTitleBDVSort 5.83 (3.7%) 5.87 (4.0%) 0.7% ( -6% - 8%) 0.584 OrNotHighMed 516.84 (2.5%) 520.76 (2.5%) 0.8% ( -4% - 5%) 0.334 IntNRQ 29.24 (3.0%) 29.50 (4.1%) 0.9% ( -6% - 8%) 0.425 OrHighNotMed 268.45 (4.4%) 270.92 (4.2%) 0.9% ( -7% - 9%) 0.501 HighTermMonthSort 2498.46 (4.8%) 2590.43 (3.7%) 3.7% ( -4% - 12%) 0.007 AndHighLow 747.94 (2.1%) 775.60 (4.0%) 3.7% ( -2% - 10%) 0.000 PKLookup 141.68 (2.0%) 177.85 (1.5%) 25.5% ( 21% - 29%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org