javanna commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1749979372
########## lucene/core/src/test/org/apache/lucene/index/TestForTooMuchCloning.java: ########## @@ -80,7 +80,7 @@ public void test() throws Exception { // System.out.println("query clone count=" + queryCloneCount); assertTrue( "too many calls to IndexInput.clone during TermRangeQuery: " + queryCloneCount, - queryCloneCount < 50); + queryCloneCount <= Math.max(s.getLeafContexts().size(), s.getSlices().length) * 4); Review Comment: I did more digging on this, things were much more predictable before because we only have 20 docs and a couple of segments. `50` was then high enough for all cases. With intra-segment slicing, we end up with more slices, and the number of clone calls is indeed a factor of the number of slices, but there is more to it which depends on the collected terms, and I am not sure how to calculate that exactly or make it predictable. The test does not seem to do that either so far, it just has a high enough value. I see two clone calls per partition done when `Weight#scorerSupplier` is called. They are from two different places in the `IntersectTermsEnum` constructor (line 89 and 98). I see other two clone calls per partition done when `ScorerSupplier#bulkScorer` is called. That is where my current factor of 4 came from. It seems to work in the vast majority of the cases. I have a seed (`F02FB6F046EE8C80`) where I have 9 partitions, but a total of 40 clone calls as part of search. Those additional 4 calls are rather unpredictable, I think that it depends on whether the query is rewritten as a boolean query or not. After all this analysis though, I realize that I came up with a formula that is more restrictive than the previous fixed value. Here we are searching 9 partitions, and the total becomes `36` (against `40` real value), while previously the fixed limit was `50`. I think I can increase the factor further without worrying too much. Key is that we are currently limiting the max number of partitions per segments to 5. We will need to adapt things in tests if we remove that limit in the future, which is likely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org