[ https://issues.apache.org/jira/browse/LUCENE-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981241#comment-16981241 ]
Ben Manes commented on LUCENE-9038: ----------------------------------- I tried running the [luceneutil|https://github.com/mikemccand/luceneutil] benchmark against this change rebased on master. The benchmark is pretty noisy and not sure how the cache interacts, but these were the results. {code} TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff Respell 184.34 (32.4%) 167.09 (34.9%) -9.4% ( -57% - 85%) Fuzzy1 213.08 (15.1%) 202.41 (15.4%) -5.0% ( -30% - 30%) BrowseMonthSSDVFacets 1789.91 (10.4%) 1759.04 (11.6%) -1.7% ( -21% - 22%) LowTerm 3172.83 (11.3%) 3149.06 (11.1%) -0.7% ( -20% - 24%) LowSloppyPhrase 510.21 (12.6%) 505.35 (5.4%) -1.0% ( -16% - 19%) OrHighLow 911.22 (11.4%) 907.11 (8.7%) -0.5% ( -18% - 22%) MedSpanNear 639.59 (14.3%) 637.37 (11.8%) -0.3% ( -23% - 29%) HighTermMonthSort 1410.18 (14.8%) 1414.44 (17.8%) 0.3% ( -28% - 38%) OrHighHigh 282.72 (18.9%) 283.90 (27.8%) 0.4% ( -38% - 58%) AndHighLow 1811.44 (16.5%) 1826.13 (8.5%) 0.8% ( -20% - 30%) LowPhrase 830.24 (12.8%) 837.28 (9.7%) 0.8% ( -19% - 26%) BrowseDayOfYearSSDVFacets 1538.60 (9.5%) 1552.58 (11.5%) 0.9% ( -18% - 24%) HighTerm 1010.87 (11.3%) 1020.64 (9.9%) 1.0% ( -18% - 24%) MedPhrase 571.41 (11.5%) 579.31 (7.3%) 1.4% ( -15% - 22%) MedSloppyPhrase 417.12 (21.1%) 423.51 (21.7%) 1.5% ( -34% - 56%) LowSpanNear 746.19 (18.1%) 758.25 (12.9%) 1.6% ( -24% - 39%) Wildcard 184.23 (29.0%) 187.63 (29.3%) 1.8% ( -43% - 84%) BrowseDateTaxoFacets 2747.64 (15.6%) 2804.34 (14.5%) 2.1% ( -24% - 38%) BrowseDayOfYearTaxoFacets 6748.62 (7.1%) 6900.47 (6.1%) 2.3% ( -10% - 16%) AndHighHigh 608.66 (11.9%) 622.76 (16.0%) 2.3% ( -22% - 34%) AndHighMed 1974.49 (14.2%) 2031.35 (10.3%) 2.9% ( -18% - 31%) Fuzzy2 19.26 (69.9%) 19.84 (54.5%) 3.0% ( -71% - 423%) MedTerm 2809.96 (9.1%) 2900.82 (10.7%) 3.2% ( -15% - 25%) HighIntervalsOrdered 253.46 (37.7%) 261.87 (43.4%) 3.3% ( -56% - 135%) BrowseMonthTaxoFacets 6838.39 (8.4%) 7109.56 (8.4%) 4.0% ( -11% - 22%) HighSloppyPhrase 379.10 (20.3%) 395.81 (20.4%) 4.4% ( -30% - 56%) HighTermDayOfYearSort 498.94 (15.7%) 527.78 (13.5%) 5.8% ( -20% - 41%) PKLookup 158.51 (27.8%) 169.54 (12.9%) 7.0% ( -26% - 65%) Prefix3 168.46 (38.7%) 180.95 (36.4%) 7.4% ( -48% - 134%) HighPhrase 260.05 (34.0%) 279.62 (20.5%) 7.5% ( -35% - 94%) IntNRQ 598.33 (33.7%) 651.97 (33.9%) 9.0% ( -43% - 115%) OrHighMed 378.56 (32.7%) 427.55 (16.9%) 12.9% ( -27% - 93%) HighSpanNear 217.85 (37.3%) 249.79 (36.3%) 14.7% ( -42% - 140%) {code} > Evaluate Caffeine for LruQueryCache > ----------------------------------- > > Key: LUCENE-9038 > URL: https://issues.apache.org/jira/browse/LUCENE-9038 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Ben Manes > Priority: Major > Attachments: CaffeineQueryCache.java, cache.patch > > > [LRUQueryCache|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java] > appears to play a central role in Lucene's performance. There are many > issues discussing its performance, such as LUCENE-7235, LUCENE-7237, > LUCENE-8027, LUCENE-8213, and LUCENE-9002. It appears that the cache's > overhead can be just as much of a benefit as a liability, causing various > workarounds and complexity. > When reviewing the discussions and code, the following issues are concerning: > # The cache is guarded by a single lock for all reads and writes. > # All computations are performed outside of the any locking to avoid > penalizing other callers. This doesn't handle the cache stampedes meaning > that multiple threads may cache miss, compute the value, and try to store it. > That redundant work becomes expensive under load and can be mitigated with ~ > per-key locks. > # The cache queries the entry to see if it's even worth caching. At first > glance one assumes that is so that inexpensive entries don't bang on the lock > or thrash the LRU. However, this is also used to indicate data dependencies > for uncachable items (per JIRA), which perhaps shouldn't be invoking the > cache. > # The cache lookup is skipped if the global lock is held and the value is > computed, but not stored. This means a busy lock reduces performance across > all usages and the cache's effectiveness degrades. This is not counted in the > miss rate, giving a false impression. > # An attempt was made to perform computations asynchronously, due to their > heavy cost on tail latencies. That work was reverted due to test failures and > is being worked on. > # An [in-progress change|https://github.com/apache/lucene-solr/pull/940] > tries to avoid LRU thrashing due to large, infrequently used items being > cached. > # The cache is tightly intertwined with business logic, making it hard to > tease apart core algorithms and data structures from the usage scenarios. > It seems that more and more items skip being cached because of concurrency > and hit rate performance, causing special case fixes based on knowledge of > the external code flows. Since the developers are experts on search, not > caching, it seems justified to evaluate if an off-the-shelf library would be > more helpful in terms of developer time, code complexity, and performance. > Solr has already introduced [Caffeine|https://github.com/ben-manes/caffeine] > in SOLR-8241 and SOLR-13817. > The proposal is to replace the internals {{LruQueryCache}} so that external > usages are not affected in terms of the API. However, like in {{SolrCache}}, > a difference is that Caffeine only bounds by either the number of entries or > an accumulated size (e.g. bytes), but not both constraints. This likely is an > acceptable divergence in how the configuration is honored. > cc [~ab], [~dsmiley] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org