[ https://issues.apache.org/jira/browse/LUCENE-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ankit Jain updated LUCENE-10428: -------------------------------- Description: Customers complained about high CPU for Elasticsearch cluster in production. We noticed that few search requests were stuck for long time {{{quote}% curl -s localhost:9200/_cat/tasks?v indices:data/read/search[phase/query] AmMLzDQ4RrOJievRDeGFZw:569205 AmMLzDQ4RrOJievRDeGFZw:569204 direct 1645195007282 14:36:47 6.2h indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:502075 emjWc5bUTG6lgnCGLulq-Q:502074 direct 1645195037259 14:37:17 6.2h indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:583270 emjWc5bUTG6lgnCGLulq-Q:583269 direct 1645201316981 16:21:56 4.5h{quote}}} Flame graphs indicated that CPU time is mostly going into *getMinCompetitiveScore method in MaxScoreSumPropagator*. After doing some live JVM debugging found that org.apache.lucene.search.MaxScoreSumPropagator.scoreSumUpperBound method had around 4 million invocations every second Figured out the values of some parameters from live debugging: {{minScoreSum = 3.5541441 minScore + sumOfOtherMaxScores (params[0] scoreSumUpperBound) = 3.554144322872162 returnObj scoreSumUpperBound = 3.5541444 Math.ulp(minScoreSum) = 2.3841858E-7}} was: Customers complained about high CPU for Elasticsearch cluster in production. We noticed that few search requests were stuck for long time ``` % curl -s localhost:9200/_cat/tasks?v indices:data/read/search[phase/query] AmMLzDQ4RrOJievRDeGFZw:569205 AmMLzDQ4RrOJievRDeGFZw:569204 direct 1645195007282 14:36:47 6.2h indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:502075 emjWc5bUTG6lgnCGLulq-Q:502074 direct 1645195037259 14:37:17 6.2h indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:583270 emjWc5bUTG6lgnCGLulq-Q:583269 direct 1645201316981 16:21:56 4.5h ``` Flame graphs indicated that CPU time is mostly going into *getMinCompetitiveScore method in MaxScoreSumPropagator*. After doing some live JVM debugging found that org.apache.lucene.search.MaxScoreSumPropagator.scoreSumUpperBound method had around 4 million invocations every second Figured out the values of some parameters from live debugging: ``` minScoreSum = 3.5541441 minScore + sumOfOtherMaxScores (params[0] scoreSumUpperBound) = 3.554144322872162 returnObj scoreSumUpperBound = 3.5541444 Math.ulp(minScoreSum) = 2.3841858E-7 ``` > getMinCompetitiveScore method in MaxScoreSumPropagator fails to converge > leading to busy threads in infinite loop > ----------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-10428 > URL: https://issues.apache.org/jira/browse/LUCENE-10428 > Project: Lucene - Core > Issue Type: Bug > Components: core/query/scoring, core/search > Reporter: Ankit Jain > Priority: Major > > Customers complained about high CPU for Elasticsearch cluster in production. > We noticed that few search requests were stuck for long time > {{{quote}% curl -s localhost:9200/_cat/tasks?v > indices:data/read/search[phase/query] AmMLzDQ4RrOJievRDeGFZw:569205 > AmMLzDQ4RrOJievRDeGFZw:569204 direct 1645195007282 14:36:47 6.2h > indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:502075 > emjWc5bUTG6lgnCGLulq-Q:502074 direct 1645195037259 14:37:17 6.2h > indices:data/read/search[phase/query] emjWc5bUTG6lgnCGLulq-Q:583270 > emjWc5bUTG6lgnCGLulq-Q:583269 direct 1645201316981 16:21:56 4.5h{quote}}} > Flame graphs indicated that CPU time is mostly going into > *getMinCompetitiveScore method in MaxScoreSumPropagator*. After doing some > live JVM debugging found that > org.apache.lucene.search.MaxScoreSumPropagator.scoreSumUpperBound method had > around 4 million invocations every second > Figured out the values of some parameters from live debugging: > {{minScoreSum = 3.5541441 > minScore + sumOfOtherMaxScores (params[0] scoreSumUpperBound) = > 3.554144322872162 > returnObj scoreSumUpperBound = 3.5541444 > Math.ulp(minScoreSum) = 2.3841858E-7}} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org