Re: [PR] Simplify TaskExecutor API [lucene]
javanna merged PR #12603: URL: https://github.com/apache/lucene/pull/12603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]
gf2121 commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1742668578 Hi @jpountz ! Would you please take a look at this PR when you have time? Looking forward to getting your suggestions on this topic ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]
jpountz commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1742678478 Oh, interesting find, it makes sense to me but I'm not the most familiar one with this piece of code. @mikemccand or @s1monw what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Improve fallback sorter for BKD [lucene]
jpountz commented on code in PR #12610: URL: https://github.com/apache/lucene/pull/12610#discussion_r1342470366 ## lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointTreeReaderUtils.java: ## @@ -81,6 +86,40 @@ protected int byteAt(int i, int k) { return (reader.getDocID(i) >>> Math.max(0, shift)) & 0xff; } } + + @Override + protected Sorter getFallbackSorter(int k) { +return new InPlaceMergeSorter() { + @Override + protected int compare(int i, int j) { +if (k < config.packedBytesLength) { + reader.getValue(i, scratch1); + reader.getValue(j, scratch2); + int v = + Arrays.compareUnsigned( + scratch1.bytes, + scratch1.offset + k, + scratch1.offset + scratch1.length, + scratch2.bytes, + scratch2.offset + k, + scratch2.offset + scratch2.length); + if (v != 0) { +return v; + } +} +if (bitsPerDocId == 0) { Review Comment: can this ever be 0? ## lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointTreeReaderUtils.java: ## @@ -81,6 +86,40 @@ protected int byteAt(int i, int k) { return (reader.getDocID(i) >>> Math.max(0, shift)) & 0xff; } } + + @Override + protected Sorter getFallbackSorter(int k) { Review Comment: I wonder if it would help to take advantage of `ArrayUtil#getUnsignedComparator`, e.g. something like below: ```java int cmpStartOffset; ByteArrayComparator packedBytesComparator; if (k >= config.packedBytesLength) { cmpStartOffset = config.packedBytesLength; packedBytesComparator = (a, ai, b, bi) -> 0; } else if (config.packedBytesLength >= 4 && config.packedBytesLength - k <= 4) { cmpStartOffset = config.packedBytesLength - 4; packedBytesComparator = ArrayUtil.getUnsignedComparator(4); } else if (config.packedBytesLength >= 8 && config.packedBytesLength - k <= 8) { cmpStartOffset = config.packedBytesLength - 8; packedBytesComparator = ArrayUtil.getUnsignedComparator(8); } else { cmpStartOffset = k; packedBytesComparator = ArrayUtil.getUnsignedComparator(config.packedBytesLength - k); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Override `normalize` method in the `PatternReplaceFilterFactory` [lucene]
dainiusjocas opened a new pull request, #12613: URL: https://github.com/apache/lucene/pull/12613 ### Description I've been debugging a problem with a classic query parser producing a `TermRangeQuery` with wrong tokens. I'm using a `CustomAnalyzer` that uses the `PatternReplaceFilterFactory`. It turns out that the query parser calls the `normalize` method and the token filter is not applied. Funny thing is that the `PatternReplaceCharFilterFactory` overrides the `normalize` method. Is there any reason why `PatternReplaceFilterFactory` doesn't override `normalize` method? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Reproducible TestDrillSideways failure [lucene]
jpountz commented on issue #12418: URL: https://github.com/apache/lucene/issues/12418#issuecomment-1742740792 Sorry @Yuti-G I had missed your reply! I think that reverting the commit that you linked only made the test pass because this commit adds a call to `random().nextInt()` to `LuceneTestCase#newIndexWriterConfig()`. So by reverting this commit, `TestDrillSideways.testRandom()` got a difference sequence of random numbers for everything after the `newIndexWriterConfig()` call and the particular configuration that made this test fail was altered. By the way if I revert the merge-on-refresh commit and then add back the call to `r.nextInt(3)`, the failure still reproduces: diff ``` diff --cc lucene/core/src/test/org/apache/lucene/index/TestLogMergePolicy.java index 0b00a20a17f,517542699c1..000 --- a/lucene/core/src/test/org/apache/lucene/index/TestLogMergePolicy.java +++ b/lucene/core/src/test/org/apache/lucene/index/TestLogMergePolicy.java diff --git a/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java b/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java index 1a5a876a7b7..d6649a26e9a 100644 --- a/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java +++ b/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java @@ -1037,6 +1037,8 @@ public abstract class LuceneTestCase extends Assert { c.setIndexWriterEventListener(new MockIndexWriterEventListener()); } +r.nextInt(3); + c.setMaxFullFlushMergeWaitMillis(rarely() ? atLeast(r, 1000) : atLeast(r, 200)); return c; } ``` This looks unrelated to merge-on-refresh. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]
s1monw commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1742998087 This change makes sense to me. @mikemccand WDYT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Concurrent hnsw graph and builder, take two [lucene]
jbellis closed pull request #12421: Concurrent hnsw graph and builder, take two URL: https://github.com/apache/lucene/pull/12421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Concurrent hnsw graph and builder, take two [lucene]
jbellis commented on PR #12421: URL: https://github.com/apache/lucene/pull/12421#issuecomment-1743159095 Thanks for the feedback. I've switched my efforts to a DiskANN implementation in JVector, so closing this out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add missing create github release step to release wizard [lucene]
javanna merged PR #12607: URL: https://github.com/apache/lucene/pull/12607 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Create a task executor when executor is not provided [lucene]
reta commented on code in PR #12606: URL: https://github.com/apache/lucene/pull/12606#discussion_r1342999505 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -420,13 +418,12 @@ public int count(Query query) throws IOException { } /** - * Returns the leaf slices used for concurrent searching, or null if no {@code Executor} was - * passed to the constructor. + * Returns the leaf slices used for concurrent searching * * @lucene.experimental */ public LeafSlice[] getSlices() { -return (executor == null) ? null : leafSlicesSupplier.get(); +return leafSlicesSupplier.get(); Review Comment: @sohami I think you could provide the context here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343009944 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); -assertEquals(leaves.size(), numExecutions.get()); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); +if (leaves.size() <= 1) { + assertEquals(0, numExecutions.get()); +} else { + assertEquals(leaves.size(), numExecutions.get()); +} + } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; + +ExecutorService fixedThreadPoolExecutor = +Executors.newFixedThreadPool(fixedThreads, new NamedThreadFactory("concurrent-slices")); + +IndexSearcher searcher = +new IndexSearcher(reader, fixedThreadPoolExecutor) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); Review Comment: it would also be fine to call the existing slices method providing 1,1 as the threshold arguments. That should achieve the same end-goal. ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); -assertEquals(leaves.size(), numExecutions.get()); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); +if (leaves.size() <= 1) { + assertEquals(0, numExecutions.get()); +} else { + assertEquals(leaves.size(), numExecutions.get()); +} Review Comment: I think that this is incorrect, maybe due to a bad merge? That's because we now unconditionally offload to the executor. ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); -assertEquals(leaves.size(), numExecutions.get()); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); +if (leaves.size() <= 1) { + assertEquals(0, numExecutions.get()); +} else { + assertEquals(leaves.size(), numExecutions.get()); +} + } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; Review Comment: how many leaves max may we have ? I am wondering if we may end up creating too many threads. Would it hurt the test to lower the number of threads? ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); -assertEquals(leaves.size(), numExecutions.get()); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); +if (leaves.size() <= 1) { + assertEquals(0, numExecutions.get()); +} else { + assertEquals(leaves.size(), numExecutions.get()); +} + } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; + +ExecutorService fixedThreadPoolExecutor = +Executors.newFixedThreadPool(fixedThreads, new NamedThreadFactory("concurrent-slices")); + +IndexSearcher searcher = +new IndexSearcher(reader, fixedThreadPoolExecutor) {
Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343038417 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,11 +266,130 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); if (leaves.size() <= 1) { assertEquals(0, numExecutions.get()); } else { assertEquals(leaves.size(), numExecutions.get()); } } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + * + * Without a larger refactoring of the Lucene IndexSearcher and/or TaskExecutor there isn't a + * clean deterministic way to test this. This test is probabilistic using short timeouts in the + * tasks that do not throw an Exception. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; + +ExecutorService fixedThreadPoolExecutor = +Executors.newFixedThreadPool(fixedThreads, new NamedThreadFactory("concurrent-slices")); + +IndexSearcher searcher = +new IndexSearcher(reader, fixedThreadPoolExecutor) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); + } +}; + +try { + AtomicInteger callsToScorer = new AtomicInteger(0); + int numExceptions = leaves.size() == 1 ? 1 : RandomizedTest.randomIntBetween(1, 2); + MatchAllOrThrowExceptionQuery query = + new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer); + RuntimeException exc = expectThrows(RuntimeException.class, () -> searcher.search(query, 10)); + // if the TaskExecutor didn't wait for all tasks to finish, this assert would frequently fail + assertEquals(leaves.size(), callsToScorer.get()); + assertThat( + exc.getMessage(), Matchers.containsString("MatchAllOrThrowExceptionQuery Exception")); +} finally { + TestUtil.shutdownExecutorService(fixedThreadPoolExecutor); +} + } + + private static class MatchAllOrThrowExceptionQuery extends Query { + +private final AtomicInteger numExceptionsToThrow; +private final Query delegate; +private final AtomicInteger callsToScorer; + +/** + * Throws an Exception out of the {@code scorer} method the first {@code numExceptions} times it + * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery. + * + * @param numExceptions number of exceptions to throw from scorer method + * @param callsToScorer where to record the number of times the {@code scorer} method has been + * called + */ +public MatchAllOrThrowExceptionQuery(int numExceptions, AtomicInteger callsToScorer) { + this.numExceptionsToThrow = new AtomicInteger(numExceptions); + this.callsToScorer = callsToScorer; + this.delegate = new MatchAllDocsQuery(); +} + +@Override +public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) +throws IOException { + Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, boost); + + return new Weight(delegate) { +@Override +public boolean isCacheable(LeafReaderContext ctx) { + return matchAllWeight.isCacheable(ctx); +} + +@Override +public Explanation explain(LeafReaderContext context, int doc) throws IOException { + return matchAllWeight.explain(context, doc); +} + +@Override +public Scorer scorer(LeafReaderContext context) throws IOException { + if (numExceptionsToThrow.getAndDecrement() > 0) { +callsToScorer.getAndIncrement(); +throw new RuntimeException("MatchAllOrThrowExceptionQuery Exception"); + } else { +// A small sleep before incrementing the callsToScorer counter allows +// the task with the Exception to be thrown and if TaskExecutor.invokeAll +// does not wait until all tasks have finished, then the callsToScorer +// counter will not match the total number of tasks (or rather usually will +// not match, since there is a race condition that makes it probabilistic). +RandomizedTest.sleep(25); Review Comment: > You could put in a latch that the test waits upon that gates the non-Exception-
Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343042425 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,11 +266,130 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); if (leaves.size() <= 1) { assertEquals(0, numExecutions.get()); } else { assertEquals(leaves.size(), numExecutions.get()); } } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + * + * Without a larger refactoring of the Lucene IndexSearcher and/or TaskExecutor there isn't a + * clean deterministic way to test this. This test is probabilistic using short timeouts in the + * tasks that do not throw an Exception. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; + +ExecutorService fixedThreadPoolExecutor = +Executors.newFixedThreadPool(fixedThreads, new NamedThreadFactory("concurrent-slices")); + +IndexSearcher searcher = +new IndexSearcher(reader, fixedThreadPoolExecutor) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); + } +}; + +try { + AtomicInteger callsToScorer = new AtomicInteger(0); + int numExceptions = leaves.size() == 1 ? 1 : RandomizedTest.randomIntBetween(1, 2); + MatchAllOrThrowExceptionQuery query = + new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer); + RuntimeException exc = expectThrows(RuntimeException.class, () -> searcher.search(query, 10)); + // if the TaskExecutor didn't wait for all tasks to finish, this assert would frequently fail + assertEquals(leaves.size(), callsToScorer.get()); + assertThat( + exc.getMessage(), Matchers.containsString("MatchAllOrThrowExceptionQuery Exception")); +} finally { + TestUtil.shutdownExecutorService(fixedThreadPoolExecutor); +} + } + + private static class MatchAllOrThrowExceptionQuery extends Query { + +private final AtomicInteger numExceptionsToThrow; +private final Query delegate; +private final AtomicInteger callsToScorer; + +/** + * Throws an Exception out of the {@code scorer} method the first {@code numExceptions} times it + * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery. + * + * @param numExceptions number of exceptions to throw from scorer method + * @param callsToScorer where to record the number of times the {@code scorer} method has been + * called + */ +public MatchAllOrThrowExceptionQuery(int numExceptions, AtomicInteger callsToScorer) { + this.numExceptionsToThrow = new AtomicInteger(numExceptions); + this.callsToScorer = callsToScorer; + this.delegate = new MatchAllDocsQuery(); +} + +@Override +public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) +throws IOException { + Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, boost); + + return new Weight(delegate) { +@Override +public boolean isCacheable(LeafReaderContext ctx) { + return matchAllWeight.isCacheable(ctx); +} + +@Override +public Explanation explain(LeafReaderContext context, int doc) throws IOException { + return matchAllWeight.explain(context, doc); +} + +@Override +public Scorer scorer(LeafReaderContext context) throws IOException { + if (numExceptionsToThrow.getAndDecrement() > 0) { +callsToScorer.getAndIncrement(); +throw new RuntimeException("MatchAllOrThrowExceptionQuery Exception"); + } else { +// A small sleep before incrementing the callsToScorer counter allows +// the task with the Exception to be thrown and if TaskExecutor.invokeAll +// does not wait until all tasks have finished, then the callsToScorer +// counter will not match the total number of tasks (or rather usually will +// not match, since there is a race condition that makes it probabilistic). +RandomizedTest.sleep(25); Review Comment: Thinking more of this, perhaps using a single threaded executor, or even one that
Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343048682 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); -assertEquals(leaves.size(), numExecutions.get()); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); +if (leaves.size() <= 1) { + assertEquals(0, numExecutions.get()); +} else { + assertEquals(leaves.size(), numExecutions.get()); +} Review Comment: oh well, I guess I am wrong because tests succeed :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1343092520 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -139,7 +139,7 @@ public int nextDoc() throws IOException { } /** View over multiple vector values supporting iterator-style access via DocIdMerger. */ - protected static final class MergedVectorValues { + public static final class MergedVectorValues { Review Comment: This is so I can get a merged view over vectors. I don't strictly need the doc updates. I just need to be able to iterate all the vectors in row and randomly accept some. ## lucene/core/src/java/org/apache/lucene/codecs/HnswGraphProvider.java: ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.codecs; + +import java.io.IOException; +import org.apache.lucene.util.hnsw.HnswGraph; + +/** + * Provides a hnsw graph + * + * @lucene.experimental + */ +public interface HnswGraphProvider { Review Comment: I am thinking about adding this to Lucene directly. Right now we do an `instance of Lucene95...`. For future flexibility, we should have a thing that indicates it can provide a graph for a field. ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99Codec.java: ## @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.codecs.lucene99; + +import java.util.Objects; +import org.apache.lucene.codecs.Codec; +import org.apache.lucene.codecs.CompoundFormat; +import org.apache.lucene.codecs.DocValuesFormat; +import org.apache.lucene.codecs.FieldInfosFormat; +import org.apache.lucene.codecs.FilterCodec; +import org.apache.lucene.codecs.KnnVectorsFormat; +import org.apache.lucene.codecs.LiveDocsFormat; +import org.apache.lucene.codecs.NormsFormat; +import org.apache.lucene.codecs.PointsFormat; +import org.apache.lucene.codecs.PostingsFormat; +import org.apache.lucene.codecs.SegmentInfoFormat; +import org.apache.lucene.codecs.StoredFieldsFormat; +import org.apache.lucene.codecs.TermVectorsFormat; +import org.apache.lucene.codecs.lucene90.Lucene90CompoundFormat; +import org.apache.lucene.codecs.lucene90.Lucene90DocValuesFormat; +import org.apache.lucene.codecs.lucene90.Lucene90LiveDocsFormat; +import org.apache.lucene.codecs.lucene90.Lucene90NormsFormat; +import org.apache.lucene.codecs.lucene90.Lucene90PointsFormat; +import org.apache.lucene.codecs.lucene90.Lucene90PostingsFormat; +import org.apache.lucene.codecs.lucene90.Lucene90SegmentInfoFormat; +import org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat; +import org.apache.lucene.codecs.lucene90.Lucene90TermVectorsFormat; +import org.apache.lucene.codecs.lucene94.Lucene94FieldInfosFormat; +import org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat; +import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat; +import org.apache.lucene.codecs.perfield.PerFieldPostingsFormat; + +/** + * Implements the Lucene 9.8 index format Review Comment: ```suggestion * Implements the Lucene 9.9 index format ``` ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional inform
Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343142475 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,11 +266,130 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); if (leaves.size() <= 1) { assertEquals(0, numExecutions.get()); } else { assertEquals(leaves.size(), numExecutions.get()); } } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + * + * Without a larger refactoring of the Lucene IndexSearcher and/or TaskExecutor there isn't a + * clean deterministic way to test this. This test is probabilistic using short timeouts in the + * tasks that do not throw an Exception. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; + +ExecutorService fixedThreadPoolExecutor = +Executors.newFixedThreadPool(fixedThreads, new NamedThreadFactory("concurrent-slices")); + +IndexSearcher searcher = +new IndexSearcher(reader, fixedThreadPoolExecutor) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); + } +}; + +try { + AtomicInteger callsToScorer = new AtomicInteger(0); + int numExceptions = leaves.size() == 1 ? 1 : RandomizedTest.randomIntBetween(1, 2); + MatchAllOrThrowExceptionQuery query = + new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer); + RuntimeException exc = expectThrows(RuntimeException.class, () -> searcher.search(query, 10)); + // if the TaskExecutor didn't wait for all tasks to finish, this assert would frequently fail + assertEquals(leaves.size(), callsToScorer.get()); + assertThat( + exc.getMessage(), Matchers.containsString("MatchAllOrThrowExceptionQuery Exception")); +} finally { + TestUtil.shutdownExecutorService(fixedThreadPoolExecutor); +} + } + + private static class MatchAllOrThrowExceptionQuery extends Query { + +private final AtomicInteger numExceptionsToThrow; +private final Query delegate; +private final AtomicInteger callsToScorer; + +/** + * Throws an Exception out of the {@code scorer} method the first {@code numExceptions} times it + * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery. + * + * @param numExceptions number of exceptions to throw from scorer method + * @param callsToScorer where to record the number of times the {@code scorer} method has been + * called + */ +public MatchAllOrThrowExceptionQuery(int numExceptions, AtomicInteger callsToScorer) { + this.numExceptionsToThrow = new AtomicInteger(numExceptions); + this.callsToScorer = callsToScorer; + this.delegate = new MatchAllDocsQuery(); +} + +@Override +public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) +throws IOException { + Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, boost); + + return new Weight(delegate) { +@Override +public boolean isCacheable(LeafReaderContext ctx) { + return matchAllWeight.isCacheable(ctx); +} + +@Override +public Explanation explain(LeafReaderContext context, int doc) throws IOException { + return matchAllWeight.explain(context, doc); +} + +@Override +public Scorer scorer(LeafReaderContext context) throws IOException { + if (numExceptionsToThrow.getAndDecrement() > 0) { +callsToScorer.getAndIncrement(); +throw new RuntimeException("MatchAllOrThrowExceptionQuery Exception"); + } else { +// A small sleep before incrementing the callsToScorer counter allows +// the task with the Exception to be thrown and if TaskExecutor.invokeAll +// does not wait until all tasks have finished, then the callsToScorer +// counter will not match the total number of tasks (or rather usually will +// not match, since there is a race condition that makes it probabilistic). +RandomizedTest.sleep(25); Review Comment: Using a single threaded executor would make the test more repeatable, but it also
Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343180733 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); -assertEquals(leaves.size(), numExecutions.get()); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); +if (leaves.size() <= 1) { + assertEquals(0, numExecutions.get()); +} else { + assertEquals(leaves.size(), numExecutions.get()); +} + } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; + +ExecutorService fixedThreadPoolExecutor = +Executors.newFixedThreadPool(fixedThreads, new NamedThreadFactory("concurrent-slices")); + +IndexSearcher searcher = +new IndexSearcher(reader, fixedThreadPoolExecutor) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); Review Comment: Got it. Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343180948 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); -assertEquals(leaves.size(), numExecutions.get()); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); +if (leaves.size() <= 1) { + assertEquals(0, numExecutions.get()); +} else { + assertEquals(leaves.size(), numExecutions.get()); +} + } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; Review Comment: Fixed. I'll put a max of 8 threads. ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); -assertEquals(leaves.size(), numExecutions.get()); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); +if (leaves.size() <= 1) { + assertEquals(0, numExecutions.get()); +} else { + assertEquals(leaves.size(), numExecutions.get()); +} + } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; + +ExecutorService fixedThreadPoolExecutor = +Executors.newFixedThreadPool(fixedThreads, new NamedThreadFactory("concurrent-slices")); + +IndexSearcher searcher = +new IndexSearcher(reader, fixedThreadPoolExecutor) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); + } +}; + +try { + AtomicInteger callsToScorer = new AtomicInteger(0); + int numExceptions = leaves.size() == 1 ? 1 : RandomizedTest.randomIntBetween(1, 2); + CountDownLatch latch = new CountDownLatch(numExceptions); + MatchAllOrThrowExceptionQuery query = + new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer, latch); + RuntimeException exc = expectThrows(RuntimeException.class, () -> searcher.search(query, 10)); + // if the TaskExecutor didn't wait for all tasks to finish, this assert would frequently fail + assertEquals(leaves.size(), callsToScorer.get()); + assertThat( + exc.getMessage(), Matchers.containsString("MatchAllOrThrowExceptionQuery Exception")); Review Comment: Good idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343185358 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); -assertEquals(leaves.size(), numExecutions.get()); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); +if (leaves.size() <= 1) { + assertEquals(0, numExecutions.get()); +} else { + assertEquals(leaves.size(), numExecutions.get()); +} + } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; + +ExecutorService fixedThreadPoolExecutor = +Executors.newFixedThreadPool(fixedThreads, new NamedThreadFactory("concurrent-slices")); + +IndexSearcher searcher = +new IndexSearcher(reader, fixedThreadPoolExecutor) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); + } +}; + +try { + AtomicInteger callsToScorer = new AtomicInteger(0); + int numExceptions = leaves.size() == 1 ? 1 : RandomizedTest.randomIntBetween(1, 2); + CountDownLatch latch = new CountDownLatch(numExceptions); + MatchAllOrThrowExceptionQuery query = + new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer, latch); + RuntimeException exc = expectThrows(RuntimeException.class, () -> searcher.search(query, 10)); + // if the TaskExecutor didn't wait for all tasks to finish, this assert would frequently fail + assertEquals(leaves.size(), callsToScorer.get()); + assertThat( + exc.getMessage(), Matchers.containsString("MatchAllOrThrowExceptionQuery Exception")); +} finally { + TestUtil.shutdownExecutorService(fixedThreadPoolExecutor); +} + } + + private static class MatchAllOrThrowExceptionQuery extends Query { + +private final AtomicInteger numExceptionsToThrow; +private final Query delegate; +private final AtomicInteger callsToScorer; +private final CountDownLatch latch; + +/** + * Throws an Exception out of the {@code scorer} method the first {@code numExceptions} times it + * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery. + * + * @param numExceptions number of exceptions to throw from scorer method + * @param callsToScorer where to record the number of times the {@code scorer} method has been + * called + */ +public MatchAllOrThrowExceptionQuery( +int numExceptions, AtomicInteger callsToScorer, CountDownLatch latch) { + this.numExceptionsToThrow = new AtomicInteger(numExceptions); + this.callsToScorer = callsToScorer; + this.delegate = new MatchAllDocsQuery(); + this.latch = latch; +} + +@Override +public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) +throws IOException { + Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, boost); + + return new Weight(delegate) { +@Override +public boolean isCacheable(LeafReaderContext ctx) { + return matchAllWeight.isCacheable(ctx); +} + +@Override +public Explanation explain(LeafReaderContext context, int doc) throws IOException { + return matchAllWeight.explain(context, doc); +} + +@Override +public Scorer scorer(LeafReaderContext context) throws IOException { + if (numExceptionsToThrow.getAndDecrement() > 0) { +callsToScorer.getAndIncrement(); +try { + throw new RuntimeException("MatchAllOrThrowExceptionQuery Exception"); +} finally { + latch.countDown(); +} + } else { +try { + latch.await(5000, TimeUnit.MILLISECONDS); Review Comment: Added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubsc
Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343266881 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,11 +266,130 @@ protected LeafSlice[] slices(List leaves) { return slices.toArray(new LeafSlice[0]); } }; -searcher.search(new MatchAllDocsQuery(), 10); +TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10); +assertTrue(topDocs.totalHits.value > 0); if (leaves.size() <= 1) { assertEquals(0, numExecutions.get()); } else { assertEquals(leaves.size(), numExecutions.get()); } } + + /** + * Tests that when IndexerSearcher runs concurrent searches on multiple slices if any Exception is + * thrown by one of the slice tasks, it will not return until all tasks have completed. + * + * Without a larger refactoring of the Lucene IndexSearcher and/or TaskExecutor there isn't a + * clean deterministic way to test this. This test is probabilistic using short timeouts in the + * tasks that do not throw an Exception. + */ + public void testMultipleSegmentsOnTheExecutorWithException() { +List leaves = reader.leaves(); +int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2; + +ExecutorService fixedThreadPoolExecutor = +Executors.newFixedThreadPool(fixedThreads, new NamedThreadFactory("concurrent-slices")); + +IndexSearcher searcher = +new IndexSearcher(reader, fixedThreadPoolExecutor) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); + } +}; + +try { + AtomicInteger callsToScorer = new AtomicInteger(0); + int numExceptions = leaves.size() == 1 ? 1 : RandomizedTest.randomIntBetween(1, 2); + MatchAllOrThrowExceptionQuery query = + new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer); + RuntimeException exc = expectThrows(RuntimeException.class, () -> searcher.search(query, 10)); + // if the TaskExecutor didn't wait for all tasks to finish, this assert would frequently fail + assertEquals(leaves.size(), callsToScorer.get()); + assertThat( + exc.getMessage(), Matchers.containsString("MatchAllOrThrowExceptionQuery Exception")); +} finally { + TestUtil.shutdownExecutorService(fixedThreadPoolExecutor); +} + } + + private static class MatchAllOrThrowExceptionQuery extends Query { + +private final AtomicInteger numExceptionsToThrow; +private final Query delegate; +private final AtomicInteger callsToScorer; + +/** + * Throws an Exception out of the {@code scorer} method the first {@code numExceptions} times it + * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery. + * + * @param numExceptions number of exceptions to throw from scorer method + * @param callsToScorer where to record the number of times the {@code scorer} method has been + * called + */ +public MatchAllOrThrowExceptionQuery(int numExceptions, AtomicInteger callsToScorer) { + this.numExceptionsToThrow = new AtomicInteger(numExceptions); + this.callsToScorer = callsToScorer; + this.delegate = new MatchAllDocsQuery(); +} + +@Override +public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) +throws IOException { + Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, boost); + + return new Weight(delegate) { +@Override +public boolean isCacheable(LeafReaderContext ctx) { + return matchAllWeight.isCacheable(ctx); +} + +@Override +public Explanation explain(LeafReaderContext context, int doc) throws IOException { + return matchAllWeight.explain(context, doc); +} + +@Override +public Scorer scorer(LeafReaderContext context) throws IOException { + if (numExceptionsToThrow.getAndDecrement() > 0) { +callsToScorer.getAndIncrement(); +throw new RuntimeException("MatchAllOrThrowExceptionQuery Exception"); + } else { +// A small sleep before incrementing the callsToScorer counter allows +// the task with the Exception to be thrown and if TaskExecutor.invokeAll +// does not wait until all tasks have finished, then the callsToScorer +// counter will not match the total number of tasks (or rather usually will +// not match, since there is a race condition that makes it probabilistic). +RandomizedTest.sleep(25); Review Comment: I pushed two new commits - one that address all feedback except this one. The fina
[PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]
gtroitskiy opened a new pull request, #12614: URL: https://github.com/apache/lucene/pull/12614 …cy check **Root cause** `onQueryCache` increases `ramBytesUsed` for specified amount, that is being calculated with respect to query being `Accountable` or not. Unfortunately, `onQueryEviction` does not the same. If some heavy accountable query has `ramBytesUsed()` greater than `QUERY_DEFAULT_RAM_BYTES_USED`, the delta `ghostBytes = ramBytesUsed() - QUERY_DEFAULT_RAM_BYTES_USED` remains in total `LRUQueryCache#ramBytesUsed` forever. **Hit rate drops to 0** since total sum of ghost bytes monotonously increases with each cached accountable query (>QUERY_DEFAULT_RAM_BYTES_USED), eventually it becomes greater than `maxRamBytesUsed`, and each newly cached query immediately evicts from the cache. **Current behavior of LRUQueryCache for some real service in production [though not fully optimized]** 1. Service restarted. 2. Reached `maxRamBytesUsed`. Started eviction. From now on size of cached queries decreases to compensate increasing ghost bytes. 3. Total amount of ghost bytes is greater than `maxRamBytesUsed`. If any new query was cached, it evicted at the same time. Cache size is 0. Hit rate is 0.    **After fix**  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]
hossman commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-1744028622 git bisect has identified `f2bd0bbcdd38cd3c681a9d302bdb856f1a62208d` as the cause of a recent jenkins failure in `TestBlockMaxConjunction.testRandom` that reproduces reliably for me locally on `main` Example of failure... ``` org.apache.lucene.search.TestBlockMaxConjunction > testRandom FAILED java.lang.AssertionError: Hit 0 docnumbers don't match Hits length1=1 length2=1 hit=0: doc4=9.615059 shardIndex=-1, doc1215=9.615059 shardIndex=-1 for query:+foo:9 +foo:10 +foo:11 +foo:12 +foo:13 at __randomizedtesting.SeedInfo.seed([C738B747E3320853:B57492485252BE20]:0) at org.junit.Assert.fail(Assert.java:89) at org.apache.lucene.tests.search.CheckHits.checkEqual(CheckHits.java:229) at org.apache.lucene.tests.search.CheckHits.doCheckTopScores(CheckHits.java:709) at org.apache.lucene.tests.search.CheckHits.checkTopScores(CheckHits.java:694) at org.apache.lucene.search.TestBlockMaxConjunction.testRandom(TestBlockMaxConjunction.java:81) ... 2> NOTE: reproduce with: gradlew test --tests TestBlockMaxConjunction.testRandom -Dtests.seed=C738B747E3320853 -Dtests.multiplier=2 -Dtests.locale=or -Dtests.timezone=Australia/Tasmania -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ``` git bisect log... ``` $ git bisect log # bad: [1dd05c89b0836531d367d2692ea5eae7d54b78fd] Add missing create github release step to release wizard (#12607) # good: [d62ca4a01f3693e0c6043a48080b33d03fdee8b4] add missing changelog entry for #12498 git bisect start '1dd05c89b0836531d367d2692ea5eae7d54b78fd' 'd62ca4a01f3693e0c6043a48080b33d03fdee8b4' # good: [51ade888f3274ea42ed4beb2bf000d7f922de4c7] Update wrong PR number in CHANGES.txt git bisect good 51ade888f3274ea42ed4beb2bf000d7f922de4c7 # bad: [ce464c7d6d20f49c2fe29126fcf400ee1cfeb112] Fix test failure. git bisect bad ce464c7d6d20f49c2fe29126fcf400ee1cfeb112 # good: [3deead0ed32494d7159c0023dcc86c218c43f4eb] Remove deprecated IndexSearcher#getExecutor method (#12580) git bisect good 3deead0ed32494d7159c0023dcc86c218c43f4eb # good: [d48913a957392e2746b489fe5aef77a21250e4b4] Allow reading / writing binary stored fields as DataInput (#12581) git bisect good d48913a957392e2746b489fe5aef77a21250e4b4 # bad: [483d28853a03aa57e727f0639032918e725a7032] Move CHANGES entry to correct version. git bisect bad 483d28853a03aa57e727f0639032918e725a7032 # bad: [f2bd0bbcdd38cd3c681a9d302bdb856f1a62208d] Run top-level conjunctions of term queries with a specialized BulkScorer. (#12382) git bisect bad f2bd0bbcdd38cd3c681a9d302bdb856f1a62208d # first bad commit: [f2bd0bbcdd38cd3c681a9d302bdb856f1a62208d] Run top-level conjunctions of term queries with a specialized BulkScorer. (#12382) ``` For the past 2 weeks, `TestBlockMaxConjunction.testRandom` has seen a roughly ~2% jenkins failure rate per week (with 17 of 746 runs in the past 7 days failing). I have not looked into the failures of the other jenkins builds to confirm if the tes failure messages are identical. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]
jpountz commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-1744317702 Thanks Hoss! I had missed this failure, it looks like a real one. I'm looking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org