date:20231002

Re: [PR] Simplify TaskExecutor API [lucene]

2023-10-02 Thread via GitHub



javanna merged PR #12603:
URL: https://github.com/apache/lucene/pull/12603


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-02 Thread via GitHub



gf2121 commented on PR #12604:
URL: https://github.com/apache/lucene/pull/12604#issuecomment-1742668578

   Hi @jpountz ! Would you please take a look at this PR when you have time? 
Looking forward to getting your suggestions on this topic ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-02 Thread via GitHub



jpountz commented on PR #12604:
URL: https://github.com/apache/lucene/pull/12604#issuecomment-1742678478

   Oh, interesting find, it makes sense to me but I'm not the most familiar one 
with this piece of code. @mikemccand or @s1monw what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Improve fallback sorter for BKD [lucene]

2023-10-02 Thread via GitHub



jpountz commented on code in PR #12610:
URL: https://github.com/apache/lucene/pull/12610#discussion_r1342470366


##
lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointTreeReaderUtils.java:
##
@@ -81,6 +86,40 @@ protected int byteAt(int i, int k) {
   return (reader.getDocID(i) >>> Math.max(0, shift)) & 0xff;
 }
   }
+
+  @Override
+  protected Sorter getFallbackSorter(int k) {
+return new InPlaceMergeSorter() {
+  @Override
+  protected int compare(int i, int j) {
+if (k < config.packedBytesLength) {
+  reader.getValue(i, scratch1);
+  reader.getValue(j, scratch2);
+  int v =
+  Arrays.compareUnsigned(
+  scratch1.bytes,
+  scratch1.offset + k,
+  scratch1.offset + scratch1.length,
+  scratch2.bytes,
+  scratch2.offset + k,
+  scratch2.offset + scratch2.length);
+  if (v != 0) {
+return v;
+  }
+}
+if (bitsPerDocId == 0) {

Review Comment:
   can this ever be 0?



##
lucene/core/src/java/org/apache/lucene/util/bkd/MutablePointTreeReaderUtils.java:
##
@@ -81,6 +86,40 @@ protected int byteAt(int i, int k) {
   return (reader.getDocID(i) >>> Math.max(0, shift)) & 0xff;
 }
   }
+
+  @Override
+  protected Sorter getFallbackSorter(int k) {

Review Comment:
   I wonder if it would help to take advantage of 
`ArrayUtil#getUnsignedComparator`, e.g. something like below:
   
   ```java
   int cmpStartOffset;
   ByteArrayComparator packedBytesComparator;
   if (k >= config.packedBytesLength) {
 cmpStartOffset = config.packedBytesLength;
 packedBytesComparator = (a, ai, b, bi) -> 0;
   } else if (config.packedBytesLength >= 4 && config.packedBytesLength - k <= 
4) {
 cmpStartOffset = config.packedBytesLength - 4;
 packedBytesComparator = ArrayUtil.getUnsignedComparator(4);
   } else if (config.packedBytesLength >= 8 && config.packedBytesLength - k <= 
8) {
 cmpStartOffset = config.packedBytesLength - 8;
 packedBytesComparator = ArrayUtil.getUnsignedComparator(8);
   } else {
 cmpStartOffset = k;
 packedBytesComparator = 
ArrayUtil.getUnsignedComparator(config.packedBytesLength - k);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] Override `normalize` method in the `PatternReplaceFilterFactory` [lucene]

2023-10-02 Thread via GitHub



dainiusjocas opened a new pull request, #12613:
URL: https://github.com/apache/lucene/pull/12613

   ### Description
   
   I've been debugging a problem with a classic query parser producing a 
`TermRangeQuery` with wrong tokens. I'm using  a `CustomAnalyzer` that uses the 
`PatternReplaceFilterFactory`. It turns out that the query parser calls the 
`normalize` method and the token filter is not applied.
   
   Funny thing is that the `PatternReplaceCharFilterFactory` overrides the 
`normalize` method.
   
   Is there any reason why `PatternReplaceFilterFactory` doesn't override 
`normalize` method?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-10-02 Thread via GitHub



jpountz commented on issue #12418:
URL: https://github.com/apache/lucene/issues/12418#issuecomment-1742740792

   Sorry @Yuti-G I had missed your reply!
   
   I think that reverting the commit that you linked only made the test pass 
because this commit adds a call to `random().nextInt()` to 
`LuceneTestCase#newIndexWriterConfig()`. So by reverting this commit, 
`TestDrillSideways.testRandom()` got a difference sequence of random numbers 
for everything after the `newIndexWriterConfig()` call and the particular 
configuration that made this test fail was altered.
   
   By the way if I revert the merge-on-refresh commit and then add back the 
call to `r.nextInt(3)`, the failure still reproduces:
   
   diff
   ```
   diff --cc 
lucene/core/src/test/org/apache/lucene/index/TestLogMergePolicy.java
   index 0b00a20a17f,517542699c1..000
   --- a/lucene/core/src/test/org/apache/lucene/index/TestLogMergePolicy.java
   +++ b/lucene/core/src/test/org/apache/lucene/index/TestLogMergePolicy.java
   diff --git 
a/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java
 
b/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java
   index 1a5a876a7b7..d6649a26e9a 100644
   --- 
a/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java
   +++ 
b/lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java
   @@ -1037,6 +1037,8 @@ public abstract class LuceneTestCase extends Assert {
  c.setIndexWriterEventListener(new MockIndexWriterEventListener());
}

   +r.nextInt(3);
   +
c.setMaxFullFlushMergeWaitMillis(rarely() ? atLeast(r, 1000) : 
atLeast(r, 200));
return c;
  }
   
   ```
   
   This looks unrelated to merge-on-refresh.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-02 Thread via GitHub



s1monw commented on PR #12604:
URL: https://github.com/apache/lucene/pull/12604#issuecomment-1742998087

   This change makes sense to me. @mikemccand WDYT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Concurrent hnsw graph and builder, take two [lucene]

2023-10-02 Thread via GitHub



jbellis closed pull request #12421: Concurrent hnsw graph and builder, take two
URL: https://github.com/apache/lucene/pull/12421


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Concurrent hnsw graph and builder, take two [lucene]

2023-10-02 Thread via GitHub



jbellis commented on PR #12421:
URL: https://github.com/apache/lucene/pull/12421#issuecomment-1743159095

   Thanks for the feedback.  I've switched my efforts to a DiskANN 
implementation in JVector, so closing this out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add missing create github release step to release wizard [lucene]

2023-10-02 Thread via GitHub



javanna merged PR #12607:
URL: https://github.com/apache/lucene/pull/12607


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Create a task executor when executor is not provided [lucene]

2023-10-02 Thread via GitHub



reta commented on code in PR #12606:
URL: https://github.com/apache/lucene/pull/12606#discussion_r1342999505


##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -420,13 +418,12 @@ public int count(Query query) throws IOException {
   }
 
   /**
-   * Returns the leaf slices used for concurrent searching, or null if no 
{@code Executor} was
-   * passed to the constructor.
+   * Returns the leaf slices used for concurrent searching
*
* @lucene.experimental
*/
   public LeafSlice[] getSlices() {
-return (executor == null) ? null : leafSlicesSupplier.get();
+return leafSlicesSupplier.get();

Review Comment:
   @sohami I think you could provide the context here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub



javanna commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343009944


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,7 +266,133 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
-assertEquals(leaves.size(), numExecutions.get());
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
+if (leaves.size() <= 1) {
+  assertEquals(0, numExecutions.get());
+} else {
+  assertEquals(leaves.size(), numExecutions.get());
+}
+  }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+ArrayList slices = new ArrayList<>();
+for (LeafReaderContext ctx : leaves) {
+  slices.add(new LeafSlice(Arrays.asList(ctx)));
+}
+return slices.toArray(new LeafSlice[0]);

Review Comment:
   it would also be fine to call the existing slices method providing 1,1 as 
the threshold arguments. That should achieve the same end-goal.



##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,7 +266,133 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
-assertEquals(leaves.size(), numExecutions.get());
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
+if (leaves.size() <= 1) {
+  assertEquals(0, numExecutions.get());
+} else {
+  assertEquals(leaves.size(), numExecutions.get());
+}

Review Comment:
   I think that this is incorrect, maybe due to a bad merge? That's because we 
now unconditionally offload to the executor.



##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,7 +266,133 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
-assertEquals(leaves.size(), numExecutions.get());
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
+if (leaves.size() <= 1) {
+  assertEquals(0, numExecutions.get());
+} else {
+  assertEquals(leaves.size(), numExecutions.get());
+}
+  }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;

Review Comment:
   how many leaves max may we have ? I am wondering if we may end up creating 
too many threads. Would it hurt the test to lower the number of threads?



##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,7 +266,133 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
-assertEquals(leaves.size(), numExecutions.get());
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
+if (leaves.size() <= 1) {
+  assertEquals(0, numExecutions.get());
+} else {
+  assertEquals(leaves.size(), numExecutions.get());
+}
+  }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub



javanna commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343038417


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,11 +266,130 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
 if (leaves.size() <= 1) {
   assertEquals(0, numExecutions.get());
 } else {
   assertEquals(leaves.size(), numExecutions.get());
 }
   }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   *
+   * Without a larger refactoring of the Lucene IndexSearcher and/or 
TaskExecutor there isn't a
+   * clean deterministic way to test this. This test is probabilistic using 
short timeouts in the
+   * tasks that do not throw an Exception.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+ArrayList slices = new ArrayList<>();
+for (LeafReaderContext ctx : leaves) {
+  slices.add(new LeafSlice(Arrays.asList(ctx)));
+}
+return slices.toArray(new LeafSlice[0]);
+  }
+};
+
+try {
+  AtomicInteger callsToScorer = new AtomicInteger(0);
+  int numExceptions = leaves.size() == 1 ? 1 : 
RandomizedTest.randomIntBetween(1, 2);
+  MatchAllOrThrowExceptionQuery query =
+  new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer);
+  RuntimeException exc = expectThrows(RuntimeException.class, () -> 
searcher.search(query, 10));
+  // if the TaskExecutor didn't wait for all tasks to finish, this assert 
would frequently fail
+  assertEquals(leaves.size(), callsToScorer.get());
+  assertThat(
+  exc.getMessage(), 
Matchers.containsString("MatchAllOrThrowExceptionQuery Exception"));
+} finally {
+  TestUtil.shutdownExecutorService(fixedThreadPoolExecutor);
+}
+  }
+
+  private static class MatchAllOrThrowExceptionQuery extends Query {
+
+private final AtomicInteger numExceptionsToThrow;
+private final Query delegate;
+private final AtomicInteger callsToScorer;
+
+/**
+ * Throws an Exception out of the {@code scorer} method the first {@code 
numExceptions} times it
+ * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery.
+ *
+ * @param numExceptions number of exceptions to throw from scorer method
+ * @param callsToScorer where to record the number of times the {@code 
scorer} method has been
+ * called
+ */
+public MatchAllOrThrowExceptionQuery(int numExceptions, AtomicInteger 
callsToScorer) {
+  this.numExceptionsToThrow = new AtomicInteger(numExceptions);
+  this.callsToScorer = callsToScorer;
+  this.delegate = new MatchAllDocsQuery();
+}
+
+@Override
+public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, 
float boost)
+throws IOException {
+  Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, 
boost);
+
+  return new Weight(delegate) {
+@Override
+public boolean isCacheable(LeafReaderContext ctx) {
+  return matchAllWeight.isCacheable(ctx);
+}
+
+@Override
+public Explanation explain(LeafReaderContext context, int doc) throws 
IOException {
+  return matchAllWeight.explain(context, doc);
+}
+
+@Override
+public Scorer scorer(LeafReaderContext context) throws IOException {
+  if (numExceptionsToThrow.getAndDecrement() > 0) {
+callsToScorer.getAndIncrement();
+throw new RuntimeException("MatchAllOrThrowExceptionQuery 
Exception");
+  } else {
+// A small sleep before incrementing the callsToScorer counter 
allows
+// the task with the Exception to be thrown and if 
TaskExecutor.invokeAll
+// does not wait until all tasks have finished, then the 
callsToScorer
+// counter will not match the total number of tasks (or rather 
usually will
+// not match, since there is a race condition that makes it 
probabilistic).
+RandomizedTest.sleep(25);

Review Comment:
   > You could put in a latch that the test waits upon that gates the 
non-Exception-

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub



javanna commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343042425


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,11 +266,130 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
 if (leaves.size() <= 1) {
   assertEquals(0, numExecutions.get());
 } else {
   assertEquals(leaves.size(), numExecutions.get());
 }
   }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   *
+   * Without a larger refactoring of the Lucene IndexSearcher and/or 
TaskExecutor there isn't a
+   * clean deterministic way to test this. This test is probabilistic using 
short timeouts in the
+   * tasks that do not throw an Exception.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+ArrayList slices = new ArrayList<>();
+for (LeafReaderContext ctx : leaves) {
+  slices.add(new LeafSlice(Arrays.asList(ctx)));
+}
+return slices.toArray(new LeafSlice[0]);
+  }
+};
+
+try {
+  AtomicInteger callsToScorer = new AtomicInteger(0);
+  int numExceptions = leaves.size() == 1 ? 1 : 
RandomizedTest.randomIntBetween(1, 2);
+  MatchAllOrThrowExceptionQuery query =
+  new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer);
+  RuntimeException exc = expectThrows(RuntimeException.class, () -> 
searcher.search(query, 10));
+  // if the TaskExecutor didn't wait for all tasks to finish, this assert 
would frequently fail
+  assertEquals(leaves.size(), callsToScorer.get());
+  assertThat(
+  exc.getMessage(), 
Matchers.containsString("MatchAllOrThrowExceptionQuery Exception"));
+} finally {
+  TestUtil.shutdownExecutorService(fixedThreadPoolExecutor);
+}
+  }
+
+  private static class MatchAllOrThrowExceptionQuery extends Query {
+
+private final AtomicInteger numExceptionsToThrow;
+private final Query delegate;
+private final AtomicInteger callsToScorer;
+
+/**
+ * Throws an Exception out of the {@code scorer} method the first {@code 
numExceptions} times it
+ * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery.
+ *
+ * @param numExceptions number of exceptions to throw from scorer method
+ * @param callsToScorer where to record the number of times the {@code 
scorer} method has been
+ * called
+ */
+public MatchAllOrThrowExceptionQuery(int numExceptions, AtomicInteger 
callsToScorer) {
+  this.numExceptionsToThrow = new AtomicInteger(numExceptions);
+  this.callsToScorer = callsToScorer;
+  this.delegate = new MatchAllDocsQuery();
+}
+
+@Override
+public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, 
float boost)
+throws IOException {
+  Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, 
boost);
+
+  return new Weight(delegate) {
+@Override
+public boolean isCacheable(LeafReaderContext ctx) {
+  return matchAllWeight.isCacheable(ctx);
+}
+
+@Override
+public Explanation explain(LeafReaderContext context, int doc) throws 
IOException {
+  return matchAllWeight.explain(context, doc);
+}
+
+@Override
+public Scorer scorer(LeafReaderContext context) throws IOException {
+  if (numExceptionsToThrow.getAndDecrement() > 0) {
+callsToScorer.getAndIncrement();
+throw new RuntimeException("MatchAllOrThrowExceptionQuery 
Exception");
+  } else {
+// A small sleep before incrementing the callsToScorer counter 
allows
+// the task with the Exception to be thrown and if 
TaskExecutor.invokeAll
+// does not wait until all tasks have finished, then the 
callsToScorer
+// counter will not match the total number of tasks (or rather 
usually will
+// not match, since there is a race condition that makes it 
probabilistic).
+RandomizedTest.sleep(25);

Review Comment:
   Thinking more of this, perhaps using a single threaded executor, or even one 
that

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub



javanna commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343048682


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,7 +266,133 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
-assertEquals(leaves.size(), numExecutions.get());
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
+if (leaves.size() <= 1) {
+  assertEquals(0, numExecutions.get());
+} else {
+  assertEquals(leaves.size(), numExecutions.get());
+}

Review Comment:
   oh well, I guess I am wrong because tests succeed :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-02 Thread via GitHub



benwtrent commented on code in PR #12582:
URL: https://github.com/apache/lucene/pull/12582#discussion_r1343092520


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java:
##
@@ -139,7 +139,7 @@ public int nextDoc() throws IOException {
   }
 
   /** View over multiple vector values supporting iterator-style access via 
DocIdMerger. */
-  protected static final class MergedVectorValues {
+  public static final class MergedVectorValues {

Review Comment:
   This is so I can get a merged view over vectors. I don't strictly need the 
doc updates. I just need to be able to iterate all the vectors in row and 
randomly accept some.



##
lucene/core/src/java/org/apache/lucene/codecs/HnswGraphProvider.java:
##
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.codecs;
+
+import java.io.IOException;
+import org.apache.lucene.util.hnsw.HnswGraph;
+
+/**
+ * Provides a hnsw graph
+ *
+ * @lucene.experimental
+ */
+public interface HnswGraphProvider {

Review Comment:
   I am thinking about adding this to Lucene directly. Right now we do an 
`instance of Lucene95...`. For future flexibility, we should have a thing that 
indicates it can provide a graph for a field.



##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99Codec.java:
##
@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.lucene99;
+
+import java.util.Objects;
+import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.codecs.CompoundFormat;
+import org.apache.lucene.codecs.DocValuesFormat;
+import org.apache.lucene.codecs.FieldInfosFormat;
+import org.apache.lucene.codecs.FilterCodec;
+import org.apache.lucene.codecs.KnnVectorsFormat;
+import org.apache.lucene.codecs.LiveDocsFormat;
+import org.apache.lucene.codecs.NormsFormat;
+import org.apache.lucene.codecs.PointsFormat;
+import org.apache.lucene.codecs.PostingsFormat;
+import org.apache.lucene.codecs.SegmentInfoFormat;
+import org.apache.lucene.codecs.StoredFieldsFormat;
+import org.apache.lucene.codecs.TermVectorsFormat;
+import org.apache.lucene.codecs.lucene90.Lucene90CompoundFormat;
+import org.apache.lucene.codecs.lucene90.Lucene90DocValuesFormat;
+import org.apache.lucene.codecs.lucene90.Lucene90LiveDocsFormat;
+import org.apache.lucene.codecs.lucene90.Lucene90NormsFormat;
+import org.apache.lucene.codecs.lucene90.Lucene90PointsFormat;
+import org.apache.lucene.codecs.lucene90.Lucene90PostingsFormat;
+import org.apache.lucene.codecs.lucene90.Lucene90SegmentInfoFormat;
+import org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat;
+import org.apache.lucene.codecs.lucene90.Lucene90TermVectorsFormat;
+import org.apache.lucene.codecs.lucene94.Lucene94FieldInfosFormat;
+import org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat;
+import org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat;
+import org.apache.lucene.codecs.perfield.PerFieldPostingsFormat;
+
+/**
+ * Implements the Lucene 9.8 index format

Review Comment:
   ```suggestion
* Implements the Lucene 9.9 index format
   ```



##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsFormat.java:
##
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional inform

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub



quux00 commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343142475


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,11 +266,130 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
 if (leaves.size() <= 1) {
   assertEquals(0, numExecutions.get());
 } else {
   assertEquals(leaves.size(), numExecutions.get());
 }
   }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   *
+   * Without a larger refactoring of the Lucene IndexSearcher and/or 
TaskExecutor there isn't a
+   * clean deterministic way to test this. This test is probabilistic using 
short timeouts in the
+   * tasks that do not throw an Exception.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+ArrayList slices = new ArrayList<>();
+for (LeafReaderContext ctx : leaves) {
+  slices.add(new LeafSlice(Arrays.asList(ctx)));
+}
+return slices.toArray(new LeafSlice[0]);
+  }
+};
+
+try {
+  AtomicInteger callsToScorer = new AtomicInteger(0);
+  int numExceptions = leaves.size() == 1 ? 1 : 
RandomizedTest.randomIntBetween(1, 2);
+  MatchAllOrThrowExceptionQuery query =
+  new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer);
+  RuntimeException exc = expectThrows(RuntimeException.class, () -> 
searcher.search(query, 10));
+  // if the TaskExecutor didn't wait for all tasks to finish, this assert 
would frequently fail
+  assertEquals(leaves.size(), callsToScorer.get());
+  assertThat(
+  exc.getMessage(), 
Matchers.containsString("MatchAllOrThrowExceptionQuery Exception"));
+} finally {
+  TestUtil.shutdownExecutorService(fixedThreadPoolExecutor);
+}
+  }
+
+  private static class MatchAllOrThrowExceptionQuery extends Query {
+
+private final AtomicInteger numExceptionsToThrow;
+private final Query delegate;
+private final AtomicInteger callsToScorer;
+
+/**
+ * Throws an Exception out of the {@code scorer} method the first {@code 
numExceptions} times it
+ * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery.
+ *
+ * @param numExceptions number of exceptions to throw from scorer method
+ * @param callsToScorer where to record the number of times the {@code 
scorer} method has been
+ * called
+ */
+public MatchAllOrThrowExceptionQuery(int numExceptions, AtomicInteger 
callsToScorer) {
+  this.numExceptionsToThrow = new AtomicInteger(numExceptions);
+  this.callsToScorer = callsToScorer;
+  this.delegate = new MatchAllDocsQuery();
+}
+
+@Override
+public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, 
float boost)
+throws IOException {
+  Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, 
boost);
+
+  return new Weight(delegate) {
+@Override
+public boolean isCacheable(LeafReaderContext ctx) {
+  return matchAllWeight.isCacheable(ctx);
+}
+
+@Override
+public Explanation explain(LeafReaderContext context, int doc) throws 
IOException {
+  return matchAllWeight.explain(context, doc);
+}
+
+@Override
+public Scorer scorer(LeafReaderContext context) throws IOException {
+  if (numExceptionsToThrow.getAndDecrement() > 0) {
+callsToScorer.getAndIncrement();
+throw new RuntimeException("MatchAllOrThrowExceptionQuery 
Exception");
+  } else {
+// A small sleep before incrementing the callsToScorer counter 
allows
+// the task with the Exception to be thrown and if 
TaskExecutor.invokeAll
+// does not wait until all tasks have finished, then the 
callsToScorer
+// counter will not match the total number of tasks (or rather 
usually will
+// not match, since there is a race condition that makes it 
probabilistic).
+RandomizedTest.sleep(25);

Review Comment:
   Using a single threaded executor would make the test more repeatable, but it 
also

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub



quux00 commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343180733


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,7 +266,133 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
-assertEquals(leaves.size(), numExecutions.get());
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
+if (leaves.size() <= 1) {
+  assertEquals(0, numExecutions.get());
+} else {
+  assertEquals(leaves.size(), numExecutions.get());
+}
+  }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+ArrayList slices = new ArrayList<>();
+for (LeafReaderContext ctx : leaves) {
+  slices.add(new LeafSlice(Arrays.asList(ctx)));
+}
+return slices.toArray(new LeafSlice[0]);

Review Comment:
   Got it. Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub



quux00 commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343180948


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,7 +266,133 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
-assertEquals(leaves.size(), numExecutions.get());
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
+if (leaves.size() <= 1) {
+  assertEquals(0, numExecutions.get());
+} else {
+  assertEquals(leaves.size(), numExecutions.get());
+}
+  }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;

Review Comment:
   Fixed. I'll put a max of 8 threads.



##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,7 +266,133 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
-assertEquals(leaves.size(), numExecutions.get());
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
+if (leaves.size() <= 1) {
+  assertEquals(0, numExecutions.get());
+} else {
+  assertEquals(leaves.size(), numExecutions.get());
+}
+  }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+ArrayList slices = new ArrayList<>();
+for (LeafReaderContext ctx : leaves) {
+  slices.add(new LeafSlice(Arrays.asList(ctx)));
+}
+return slices.toArray(new LeafSlice[0]);
+  }
+};
+
+try {
+  AtomicInteger callsToScorer = new AtomicInteger(0);
+  int numExceptions = leaves.size() == 1 ? 1 : 
RandomizedTest.randomIntBetween(1, 2);
+  CountDownLatch latch = new CountDownLatch(numExceptions);
+  MatchAllOrThrowExceptionQuery query =
+  new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer, 
latch);
+  RuntimeException exc = expectThrows(RuntimeException.class, () -> 
searcher.search(query, 10));
+  // if the TaskExecutor didn't wait for all tasks to finish, this assert 
would frequently fail
+  assertEquals(leaves.size(), callsToScorer.get());
+  assertThat(
+  exc.getMessage(), 
Matchers.containsString("MatchAllOrThrowExceptionQuery Exception"));

Review Comment:
   Good idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub



quux00 commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343185358


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,7 +266,133 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
-assertEquals(leaves.size(), numExecutions.get());
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
+if (leaves.size() <= 1) {
+  assertEquals(0, numExecutions.get());
+} else {
+  assertEquals(leaves.size(), numExecutions.get());
+}
+  }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+ArrayList slices = new ArrayList<>();
+for (LeafReaderContext ctx : leaves) {
+  slices.add(new LeafSlice(Arrays.asList(ctx)));
+}
+return slices.toArray(new LeafSlice[0]);
+  }
+};
+
+try {
+  AtomicInteger callsToScorer = new AtomicInteger(0);
+  int numExceptions = leaves.size() == 1 ? 1 : 
RandomizedTest.randomIntBetween(1, 2);
+  CountDownLatch latch = new CountDownLatch(numExceptions);
+  MatchAllOrThrowExceptionQuery query =
+  new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer, 
latch);
+  RuntimeException exc = expectThrows(RuntimeException.class, () -> 
searcher.search(query, 10));
+  // if the TaskExecutor didn't wait for all tasks to finish, this assert 
would frequently fail
+  assertEquals(leaves.size(), callsToScorer.get());
+  assertThat(
+  exc.getMessage(), 
Matchers.containsString("MatchAllOrThrowExceptionQuery Exception"));
+} finally {
+  TestUtil.shutdownExecutorService(fixedThreadPoolExecutor);
+}
+  }
+
+  private static class MatchAllOrThrowExceptionQuery extends Query {
+
+private final AtomicInteger numExceptionsToThrow;
+private final Query delegate;
+private final AtomicInteger callsToScorer;
+private final CountDownLatch latch;
+
+/**
+ * Throws an Exception out of the {@code scorer} method the first {@code 
numExceptions} times it
+ * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery.
+ *
+ * @param numExceptions number of exceptions to throw from scorer method
+ * @param callsToScorer where to record the number of times the {@code 
scorer} method has been
+ * called
+ */
+public MatchAllOrThrowExceptionQuery(
+int numExceptions, AtomicInteger callsToScorer, CountDownLatch latch) {
+  this.numExceptionsToThrow = new AtomicInteger(numExceptions);
+  this.callsToScorer = callsToScorer;
+  this.delegate = new MatchAllDocsQuery();
+  this.latch = latch;
+}
+
+@Override
+public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, 
float boost)
+throws IOException {
+  Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, 
boost);
+
+  return new Weight(delegate) {
+@Override
+public boolean isCacheable(LeafReaderContext ctx) {
+  return matchAllWeight.isCacheable(ctx);
+}
+
+@Override
+public Explanation explain(LeafReaderContext context, int doc) throws 
IOException {
+  return matchAllWeight.explain(context, doc);
+}
+
+@Override
+public Scorer scorer(LeafReaderContext context) throws IOException {
+  if (numExceptionsToThrow.getAndDecrement() > 0) {
+callsToScorer.getAndIncrement();
+try {
+  throw new RuntimeException("MatchAllOrThrowExceptionQuery 
Exception");
+} finally {
+  latch.countDown();
+}
+  } else {
+try {
+  latch.await(5000, TimeUnit.MILLISECONDS);

Review Comment:
   Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubsc

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-02 Thread via GitHub



quux00 commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1343266881


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,11 +266,130 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
 if (leaves.size() <= 1) {
   assertEquals(0, numExecutions.get());
 } else {
   assertEquals(leaves.size(), numExecutions.get());
 }
   }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   *
+   * Without a larger refactoring of the Lucene IndexSearcher and/or 
TaskExecutor there isn't a
+   * clean deterministic way to test this. This test is probabilistic using 
short timeouts in the
+   * tasks that do not throw an Exception.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+ArrayList slices = new ArrayList<>();
+for (LeafReaderContext ctx : leaves) {
+  slices.add(new LeafSlice(Arrays.asList(ctx)));
+}
+return slices.toArray(new LeafSlice[0]);
+  }
+};
+
+try {
+  AtomicInteger callsToScorer = new AtomicInteger(0);
+  int numExceptions = leaves.size() == 1 ? 1 : 
RandomizedTest.randomIntBetween(1, 2);
+  MatchAllOrThrowExceptionQuery query =
+  new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer);
+  RuntimeException exc = expectThrows(RuntimeException.class, () -> 
searcher.search(query, 10));
+  // if the TaskExecutor didn't wait for all tasks to finish, this assert 
would frequently fail
+  assertEquals(leaves.size(), callsToScorer.get());
+  assertThat(
+  exc.getMessage(), 
Matchers.containsString("MatchAllOrThrowExceptionQuery Exception"));
+} finally {
+  TestUtil.shutdownExecutorService(fixedThreadPoolExecutor);
+}
+  }
+
+  private static class MatchAllOrThrowExceptionQuery extends Query {
+
+private final AtomicInteger numExceptionsToThrow;
+private final Query delegate;
+private final AtomicInteger callsToScorer;
+
+/**
+ * Throws an Exception out of the {@code scorer} method the first {@code 
numExceptions} times it
+ * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery.
+ *
+ * @param numExceptions number of exceptions to throw from scorer method
+ * @param callsToScorer where to record the number of times the {@code 
scorer} method has been
+ * called
+ */
+public MatchAllOrThrowExceptionQuery(int numExceptions, AtomicInteger 
callsToScorer) {
+  this.numExceptionsToThrow = new AtomicInteger(numExceptions);
+  this.callsToScorer = callsToScorer;
+  this.delegate = new MatchAllDocsQuery();
+}
+
+@Override
+public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, 
float boost)
+throws IOException {
+  Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, 
boost);
+
+  return new Weight(delegate) {
+@Override
+public boolean isCacheable(LeafReaderContext ctx) {
+  return matchAllWeight.isCacheable(ctx);
+}
+
+@Override
+public Explanation explain(LeafReaderContext context, int doc) throws 
IOException {
+  return matchAllWeight.explain(context, doc);
+}
+
+@Override
+public Scorer scorer(LeafReaderContext context) throws IOException {
+  if (numExceptionsToThrow.getAndDecrement() > 0) {
+callsToScorer.getAndIncrement();
+throw new RuntimeException("MatchAllOrThrowExceptionQuery 
Exception");
+  } else {
+// A small sleep before incrementing the callsToScorer counter 
allows
+// the task with the Exception to be thrown and if 
TaskExecutor.invokeAll
+// does not wait until all tasks have finished, then the 
callsToScorer
+// counter will not match the total number of tasks (or rather 
usually will
+// not match, since there is a race condition that makes it 
probabilistic).
+RandomizedTest.sleep(25);

Review Comment:
   I pushed two new commits - one that address all feedback except this one. 
The fina

[PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-02 Thread via GitHub

gtroitskiy opened a new pull request, #12614:
URL: https://github.com/apache/lucene/pull/12614

…cy check

**Root cause**
`onQueryCache` increases `ramBytesUsed` for specified amount, that is being
calculated with respect to query being `Accountable` or not.
Unfortunately, `onQueryEviction` does not the same. If some heavy
accountable query has `ramBytesUsed()` greater than
`QUERY_DEFAULT_RAM_BYTES_USED`, the delta `ghostBytes = ramBytesUsed() -
QUERY_DEFAULT_RAM_BYTES_USED` remains in total `LRUQueryCache#ramBytesUsed`
forever.

**Hit rate drops to 0**
since total sum of ghost bytes monotonously increases with each cached
accountable query (>QUERY_DEFAULT_RAM_BYTES_USED), eventually it becomes
greater than `maxRamBytesUsed`, and each newly cached query immediately evicts
from the cache.

**Current behavior of LRUQueryCache for some real service in production
[though not fully optimized]**
1. Service restarted.
2. Reached `maxRamBytesUsed`. Started eviction. From now on size of cached
queries decreases to compensate increasing ghost bytes.
3. Total amount of ghost bytes is greater than `maxRamBytesUsed`. If any new
query was cached, it evicted at the same time. Cache size is 0. Hit rate is 0.

![ram](https://github.com/apache/lucene/assets/1190066/804b0514-8fa8-4749-bf64-853d07225785)

![size2](https://github.com/apache/lucene/assets/1190066/6d1ee142-b780-4c73-a787-9460a5c9ab9c)

![hitrate](https://github.com/apache/lucene/assets/1190066/bf328c17-4fe7-4b25-a94a-4cd473d4edff)

**After fix**

![Selection_693](https://github.com/apache/lucene/assets/1190066/1b330825-c62f-4b4e-80ce-107675303379)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]

2023-10-02 Thread via GitHub



hossman commented on PR #12382:
URL: https://github.com/apache/lucene/pull/12382#issuecomment-1744028622

   git bisect has identified `f2bd0bbcdd38cd3c681a9d302bdb856f1a62208d` as the 
cause of a recent jenkins failure in `TestBlockMaxConjunction.testRandom` that 
reproduces reliably for me locally on `main`
   
   Example of failure...
   ```
   org.apache.lucene.search.TestBlockMaxConjunction > testRandom FAILED
   java.lang.AssertionError: Hit 0 docnumbers don't match
   Hits length1=1  length2=1
   hit=0: doc4=9.615059 shardIndex=-1,  doc1215=9.615059 shardIndex=-1
   for query:+foo:9 +foo:10 +foo:11 +foo:12 +foo:13
   at 
__randomizedtesting.SeedInfo.seed([C738B747E3320853:B57492485252BE20]:0)
   at org.junit.Assert.fail(Assert.java:89)
   at 
org.apache.lucene.tests.search.CheckHits.checkEqual(CheckHits.java:229)
   at 
org.apache.lucene.tests.search.CheckHits.doCheckTopScores(CheckHits.java:709)
   at 
org.apache.lucene.tests.search.CheckHits.checkTopScores(CheckHits.java:694)
   at 
org.apache.lucene.search.TestBlockMaxConjunction.testRandom(TestBlockMaxConjunction.java:81)
   ...
 2> NOTE: reproduce with: gradlew test --tests 
TestBlockMaxConjunction.testRandom -Dtests.seed=C738B747E3320853 
-Dtests.multiplier=2 -Dtests.locale=or -Dtests.timezone=Australia/Tasmania 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```
   
   git bisect log...
   ```
   $ git bisect log
   # bad: [1dd05c89b0836531d367d2692ea5eae7d54b78fd] Add missing create github 
release step to release wizard (#12607)
   # good: [d62ca4a01f3693e0c6043a48080b33d03fdee8b4] add missing changelog 
entry for #12498
   git bisect start '1dd05c89b0836531d367d2692ea5eae7d54b78fd' 
'd62ca4a01f3693e0c6043a48080b33d03fdee8b4'
   # good: [51ade888f3274ea42ed4beb2bf000d7f922de4c7] Update wrong PR number in 
CHANGES.txt
   git bisect good 51ade888f3274ea42ed4beb2bf000d7f922de4c7
   # bad: [ce464c7d6d20f49c2fe29126fcf400ee1cfeb112] Fix test failure.
   git bisect bad ce464c7d6d20f49c2fe29126fcf400ee1cfeb112
   # good: [3deead0ed32494d7159c0023dcc86c218c43f4eb] Remove deprecated 
IndexSearcher#getExecutor method (#12580)
   git bisect good 3deead0ed32494d7159c0023dcc86c218c43f4eb
   # good: [d48913a957392e2746b489fe5aef77a21250e4b4] Allow reading / writing 
binary stored fields as DataInput (#12581)
   git bisect good d48913a957392e2746b489fe5aef77a21250e4b4
   # bad: [483d28853a03aa57e727f0639032918e725a7032] Move CHANGES entry to 
correct version.
   git bisect bad 483d28853a03aa57e727f0639032918e725a7032
   # bad: [f2bd0bbcdd38cd3c681a9d302bdb856f1a62208d] Run top-level conjunctions 
of term queries with a specialized BulkScorer. (#12382)
   git bisect bad f2bd0bbcdd38cd3c681a9d302bdb856f1a62208d
   # first bad commit: [f2bd0bbcdd38cd3c681a9d302bdb856f1a62208d] Run top-level 
conjunctions of term queries with a specialized BulkScorer. (#12382)
   ```
   
   For the past 2 weeks, `TestBlockMaxConjunction.testRandom` has seen a 
roughly ~2% jenkins failure rate per week (with 17 of 746 runs in the past 7 
days failing).  I have not looked into the failures of the other jenkins builds 
to confirm if the tes failure messages are identical.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]

2023-10-02 Thread via GitHub



jpountz commented on PR #12382:
URL: https://github.com/apache/lucene/pull/12382#issuecomment-1744317702

   Thanks Hoss! I had missed this failure, it looks like a real one. I'm 
looking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Simplify TaskExecutor API [lucene]

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

Re: [PR] Improve fallback sorter for BKD [lucene]

[PR] Override `normalize` method in the `PatternReplaceFilterFactory` [lucene]

Re: [I] Reproducible TestDrillSideways failure [lucene]

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

Re: [PR] Concurrent hnsw graph and builder, take two [lucene]

Re: [PR] Concurrent hnsw graph and builder, take two [lucene]

Re: [PR] Add missing create github release step to release wizard [lucene]

Re: [PR] Create a task executor when executor is not provided [lucene]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

[PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]

Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]

Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]

24 matches

Site Navigation

Mail list logo

Footer information