zacharymorn commented on pull request #418: URL: https://github.com/apache/lucene/pull/418#issuecomment-984406785
> Yes. I liked this approach because it felt like it should work relatively well since the field with the highest weight should drive scores anyway, and deciding about which clause leads impacts up-front felt like it could simplify the logic a bit. But if it doesn't yield better results, let's fall back to the previous approach. Let's maybe just double check with a profiler that the reason why this approach performs worse is actually because we get worse score boundaries and not because of some avoidable slow code like iterating or lookip up the various hash maps? I see. The latest two commits https://github.com/apache/lucene/pull/418/commits/808fec28ab6d7179974665e7fa069270845811fa & https://github.com/apache/lucene/pull/418/commits/75c5b046f8af4e73341a26a1828e97e006115a6b used max weight field to drive scoring, and overall they had around -20% impacts to tasks from `combinedFieldsUnevenlyWeightedBig` so far. A sample CPU JFR results between candidate and baseline look like the following: Candidate CPU JFR: ``` PERCENT CPU SAMPLES STACK 15.13% 13191 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact() 7.42% 6466 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 7.29% 6354 org.apache.lucene.search.DisjunctionDISIApproximation#advance() 6.79% 5921 org.apache.lucene.search.DisiPriorityQueue#downHeap() 4.34% 3785 org.apache.lucene.search.DisiPriorityQueue#topList() 3.97% 3465 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() 3.85% 3360 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq() 3.18% 2777 java.lang.Math#round() 3.01% 2624 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq() 3.01% 2624 org.apache.lucene.search.DisiPriorityQueue#top() 2.33% 2036 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score() 2.20% 1917 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue() 2.07% 1809 org.apache.lucene.util.SmallFloat#longToInt4() 1.81% 1580 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() 1.78% 1555 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#advance() 1.77% 1545 org.apache.lucene.store.ByteBufferGuard#ensureValid() 1.51% 1316 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader#findFirstGreater() 1.39% 1210 org.apache.lucene.search.DisiPriorityQueue#updateTop() 1.29% 1124 org.apache.lucene.search.ImpactsDISI#docID() 1.22% 1063 jdk.internal.misc.Unsafe#getByte() 1.19% 1037 org.apache.lucene.store.ByteBufferGuard#getByte() 1.08% 938 org.apache.lucene.search.DisiPriorityQueue#prepend() 0.97% 849 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 0.86% 750 org.apache.lucene.codecs.MultiLevelSkipListReader#skipTo() 0.82% 717 org.apache.lucene.util.SmallFloat#intToByte4() 0.78% 683 java.lang.Math#toIntExact() 0.62% 540 org.apache.lucene.codecs.lucene90.PForUtil#decode() 0.61% 529 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockImpactsDocsEnum#freq() 0.51% 441 org.apache.lucene.search.ImpactsDISI#nextDoc() 0.50% 435 org.apache.lucene.search.ImpactsDISI#advanceTarget() ``` Baseline CPU JFR ``` PERCENT CPU SAMPLES STACK 16.38% 13363 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer$MultiFieldNormValues#advanceExact() 8.67% 7073 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() 8.05% 6569 org.apache.lucene.search.DisjunctionDISIApproximation#nextDoc() 6.50% 5305 org.apache.lucene.search.DisiPriorityQueue#downHeap() 5.17% 4220 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#freq() 4.40% 3587 org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() 4.30% 3505 org.apache.lucene.search.DisiPriorityQueue#topList() 3.37% 2751 java.lang.Math#round() 2.77% 2263 org.apache.lucene.search.DisiPriorityQueue#top() 2.64% 2156 org.apache.lucene.sandbox.search.CombinedFieldQuery$WeightedDisiWrapper#freq() 2.62% 2135 org.apache.lucene.sandbox.search.CombinedFieldQuery$CombinedFieldScorer#score() 2.12% 1729 org.apache.lucene.sandbox.search.MultiNormsLeafSimScorer#getNormValue() 2.11% 1723 org.apache.lucene.util.SmallFloat#longToInt4() 2.06% 1678 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 2.01% 1640 org.apache.lucene.store.ByteBufferGuard#ensureValid() 1.77% 1442 org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue() 1.62% 1325 org.apache.lucene.search.DisiPriorityQueue#updateTop() 1.26% 1028 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc() 1.10% 899 java.lang.Math#toIntExact() 1.10% 898 org.apache.lucene.util.SmallFloat#intToByte4() 1.09% 888 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#freq() 1.06% 864 org.apache.lucene.store.ByteBufferGuard#getByte() 0.80% 652 jdk.internal.misc.Unsafe#getByte() 0.74% 605 org.apache.lucene.codecs.lucene90.PForUtil#decode() 0.68% 552 org.apache.lucene.search.DisiPriorityQueue#prepend() 0.60% 492 org.apache.lucene.codecs.lucene90.PForUtil#decodeAndPrefixSum() 0.55% 451 java.io.RandomAccessFile#readBytes() 0.38% 314 org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32() 0.38% 307 org.apache.lucene.codecs.lucene90.ForUtil#expand8() 0.32% 260 org.apache.lucene.codecs.lucene90.PForUtil#expand32() ``` As they look very similar, based on my previous debugging, it may suggest the max score being computed is high and most of the docs are not being skipped, hence the candidate implementation ended up examining the same amount of docs with baseline. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org