[ https://issues.apache.org/jira/browse/SOLR-14524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128270#comment-17128270 ]
Erick Erickson commented on SOLR-14524: --------------------------------------- Oh my. This could account multiple test issues. So i linked it for tracking... Good Sleuthing! > Harden MultiThreadedOCPTest > --------------------------- > > Key: SOLR-14524 > URL: https://issues.apache.org/jira/browse/SOLR-14524 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: master (9.0) > Reporter: Ilan Ginzburg > Assignee: Mike Drob > Priority: Minor > Labels: test > Fix For: master (9.0) > > Time Spent: 1h 50m > Remaining Estimate: 0h > > {{MultiThreadedOCPTest.test()}} fails occasionally in Jenkins because of > timing of tasks enqueue to the Collection API queue. > This test in {{testFillWorkQueue()}} enqueues a large number of tasks (115, > more than the 100 Collection API parallel executors) to the Collection API > queue for a collection COLL_A, then observes a short delay and enqueues a > task for another collection COLL_B. > It verifies that the COLL_B task (that does not require the same lock as the > COLL_A tasks) completes before the third COLL_A task. > Test failures happen because when enqueues are slowed down enough, the first > 3 tasks on COLL_A complete even before the COLL_B task gets enqueued! > In one sample failed Jenkins test execution, the COLL_B task enqueue happened > 1275ms after the enqueue of the first COLL_A, leaving plenty of time for a > few (and possibly all) COLL_A tasks to complete. > Fix will be along the lines of: > * Make the “blocking” COLL_A task longer to execute (currently 1 second) to > compensate for slow enqueues. > * Verify the COLL_B task (a 1ms task) finishes before the long running > COLL_A task does. This would be a good indication that even though the > collection queue was filled with tasks waiting for a busy lock, a non > competing task was picked and executed right away. > * Delay the enqueue of the COLL_B task to the end of processing of the first > COLL_A task. This would guarantee that COLL_B is enqueued once at least some > COLL_A tasks started processing at the Overseer. Possibly also verify that > the long running task of COLL_A didn't finish execution yet when the COLL_B > task is enqueued... > * It might be possible to set a (very) long duration for the slow task of > COLL_A (to be less vulnerable to execution delays) without requiring the test > to wait for that task to complete, but only wait for the COLL_B task to > complete (so the test doesn't run for too long). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org