[ https://issues.apache.org/jira/browse/LUCENE-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499566#comment-17499566 ]
Adrien Grand commented on LUCENE-10428: --------------------------------------- Sorry the PR should have been linked automatically in JIRA given the naming convention, I don't know why it didn't work this time. Here it is: https://github.com/apache/lucene/pull/711. It does capture debug information as you suggested. > getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge > leading to busy threads in infinite loop > ----------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-10428 > URL: https://issues.apache.org/jira/browse/LUCENE-10428 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring, core/search > Reporter: Ankit Jain > Priority: Major > Attachments: Flame_graph.png > > > Customers complained about high CPU for Elasticsearch cluster in production. > We noticed that few search requests were stuck for long time > {code:java} > % curl -s localhost:9200/_cat/tasks?v > indices:data/read/search[phase/query] AmMLzDQ4RrOJievRDeGFZw:569205 > AmMLzDQ4RrOJievRDeGFZw:569204 direct 1645195007282 14:36:47 6.2h > indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:502075 > emjWc5bUTG6lgnCGLulq-Q:502074 direct 1645195037259 14:37:17 6.2h > indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:583270 > emjWc5bUTG6lgnCGLulq-Q:583269 direct 1645201316981 16:21:56 4.5h > {code} > Flame graphs indicated that CPU time is mostly going into > *getMinCompetitiveScore method in MaxScoreSumPropagator*. After doing some > live JVM debugging found that > org.apache.lucene.search.MaxScoreSumPropagator.scoreSumUpperBound method had > around 4 million invocations every second > Figured out the values of some parameters from live debugging: > {code:java} > minScoreSum = 3.5541441 > minScore + sumOfOtherMaxScores (params[0] scoreSumUpperBound) = > 3.554144322872162 > returnObj scoreSumUpperBound = 3.5541444 > Math.ulp(minScoreSum) = 2.3841858E-7 > {code} > Example code snippet: > {code:java} > double sumOfOtherMaxScores = 3.554144322872162; > double minScoreSum = 3.5541441; > float minScore = (float) (minScoreSum - sumOfOtherMaxScores); > while (scoreSumUpperBound(minScore + sumOfOtherMaxScores) > minScoreSum) { > minScore -= Math.ulp(minScoreSum); > System.out.printf("%.20f, %.100f\n", minScore, Math.ulp(minScoreSum)); > } > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org