[GitHub] [lucene] vigyasharma commented on a diff in pull request #633: LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.

GitBox Thu, 05 May 2022 23:40:35 -0700


vigyasharma commented on code in PR #633:
URL: https://github.com/apache/lucene/pull/633#discussion_r866530117



##########
lucene/test-framework/src/java/org/apache/lucene/tests/index/MockRandomMergePolicy.java:
##########
@@ -86,6 +86,20 @@ public MergeSpecification findMerges(
     return mergeSpec;
   }
 
+  @Override
+  public MergeSpecification findMerges(CodecReader... readers) throws 
IOException {
+    if (random.nextBoolean() == true) {
+      return super.findMerges(readers);
+    } else {
+      // create a oneMerge for each reader to let them get concurrently 
processed by addIndexes()

Review Comment:
   > Well, we need to think about it. I actually hate that 
ConcurrentMergeScheduler gets used the way it does in the tests, for this same 
reason. And that's the actual problem here... I think, not your logic in the 
mergepolicy?
   
   Yes, the concurrency here comes from concurrent merge scheduler. If this 
policy was to run with the `SerialMergeScheduler`, there would be no 
concurrency. 
   
   
   Is it that randomized tests (without concurrency) cannot always be 
reproduced with their random seed (like 
[LUCENE-10529](https://issues.apache.org/jira/browse/LUCENE-10529)? And so, 
with mixing randomized tests and concurrency, we blend two unknowns. We are not 
sure if we're missing the params with which the test failed, or if the random 
seed worked but it's a race condition issue..
   
   
   > Maybe we can improve our tests to use `SerialMergeScheduler` and 
`ConcurrentMergeScheduler(n=1)` more often, so that reproducibility is improved 
across the board. We can have separate specific tests using 
`ConcurrentMergeScheduler(n>1)`?
   
   We should probably do that for tests that are validating some non-concurrent 
logic, and not asserting any CMS specific behavior. On similar lines, what do 
you think about moving concurrency tests for an area, into a separate 
class/test suite - like `TestAddIndexesConcurrency` for `addIndexes`?. Then, 
for changes that touch concurrency, contributors can separately run those test 
suites on repeat to get better confidence. It would also be easier to setup 
such classes for concurrency (like using n threads in CMS), and avoid adding 
any randomness to it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] vigyasharma commented on a diff in pull request #633: LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges.

Reply via email to