date:20230928

[GitHub] [lucene] sgup432 commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub



sgup432 commented on issue #12597:
URL: https://github.com/apache/lucene/issues/12597#issuecomment-1738594461

   @jpountz What do you think on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gf2121 commented on issue #12598: FST#Compiler allocates too much memory

2023-09-28 Thread via GitHub



gf2121 commented on issue #12598:
URL: https://github.com/apache/lucene/issues/12598#issuecomment-1738687168

   I did an experiment: Index random BytesRefs and count the byte usage when 
`BytesStore#finish` called. The following are the statistical results:
   
   ```
   total sample: 23130
   
   avg: 78.12
   min: 7
   mid: 36
   pct75: 40
   pct90: 40
   pct99: 44
   max: 10790
   ```
   
   While the bytesrefs are random, it may share little prefix and suffix, I 
tried to mock some common prefix/suffix for them like:
   
   ```
   if (R.nextBoolean()) {
 int prefixLen = R.nextInt(b.length / 2);
 System.arraycopy(commonPrefix, 0, b, 0, prefixLen);
   }
   
   if (R.nextBoolean()) {
 int suffixLen = R.nextInt(b.length / 2);
 System.arraycopy(commonSuffix, commonSuffix.length - suffixLen, b, 
b.length - suffixLen, suffixLen);
   }
   ```
   
   And here is the result:
   ```
   total sample: 27235
   
   avg: 820.540738020929
   min: 8
   mid: 24
   pct75: 629
   pct90: 3347
   pct99: 5374
   max: 29049
   ```
   
   We will allocate a 32kb while 99% cases only need 5kb. These results 
somewhat matches the allocation profile that we rarely need a second block in 
`BytesStore`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-28 Thread via GitHub



kaivalnp commented on PR #12590:
URL: https://github.com/apache/lucene/pull/12590#issuecomment-1738933241

   > My main concern is that we are adding yet another extension point for 
"partial control" when we already have that with the rewrite or something even 
more complex with the collector
   
   While we technically can do so, copy-pasting 
[`#rewrite`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L64-L93)
 seems very repetitive to me because we would also have to copy-paste 
[`#sequentialSearch`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L95-L102),
 
[`#parallelSearch`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L104-L112),
 
[`#searchLeaf`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L114-L122),
 
[`#getLeafResults`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L124-L154),
 
[`#createBitSet`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L156-L172),
 [`
 
#createRewrittenQuery`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L219-L234)
 as well as 
[`DocAndScoreQuery`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L297-L452)
 (almost the entire file, all of which are `private`) to do so..
   
   Doing this via the collector has an overhead of finding out the index-level 
`topK` from multiple segment-level results *twice* (which is done 
[here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L88)
 anyways)
   
   > It would be very easy to do the wrong thing by allowing sub-classes to 
override this method for yet another avenue of customization
   
   We also seem to have 
[`#exactSearch`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L180-L181)
 as `protected` only for checking its execution from tests (that could have the 
same issue).. We could maybe add some javadocs to 
[`#createRewrittenQuery`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L219-L234),
 so that users know when to override it / do so more carefully?
   
   > It seems to me we have enough extension points no?
   
   I feel that giving access to the final `topK` results to implementers is a 
good extension point to have, allowing them to post-process the final results 
or even rewrite into some custom `Query` *if needed* (in most cases, just 
delegate to the parent function)
   
   Also open to other suggestions (like maybe a specific `postProcess` function 
that receives the `topK` and returns the results after reading / modifying, so 
that the API is more specific)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna opened a new pull request, #12603: Simplify TaskExecutor API

2023-09-28 Thread via GitHub



javanna opened a new pull request, #12603:
URL: https://github.com/apache/lucene/pull/12603

   We recently made TaskExecutor public. It currently exposes two methods: one 
to create tasks given a collection of callables, and one to execute all tasks 
created at step 1. We can rather expose a single public method that takes a 
collection of callables which internally creates the appropriate tasks. This 
simplifies the API, and stops us from leaking the internal Task abstraction 
which can be kept private.
   
   Note that this is backwards compatible as we have not released yet a version 
where the TaskExecutor was made public. It is marked experimental anyways.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gf2121 commented on issue #12598: FST#Compiler allocates too much memory

2023-09-28 Thread via GitHub



gf2121 commented on issue #12598:
URL: https://github.com/apache/lucene/issues/12598#issuecomment-1739489111

   I get similar statistics for wikimediumall and here are the results when 
`BytesStore#finish` called 1,000,000 times.
   
   ```
   BytesStore#finish called: 100 times
   
   min: 1
   mid: 16
   avg: 64.555987
   pct75: 28
   pct90: 57
   pct99: 525
   pct999: 4957
   pct: 29124
   max: 631700
   ```
   
   It seems 1k bytes per block is enough here. 99% cases can be covered by 
single block and we at most need 600+ blocks for single FST.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gf2121 opened a new pull request, #12604: Reduce FST block size for BlockTreeTermsWriter

2023-09-28 Thread via GitHub



gf2121 opened a new pull request, #12604:
URL: https://github.com/apache/lucene/pull/12604

   ### Description
   
   
https://blunders.io/jfr-demo/indexing-4kb-2023.09.25.18.03.36/allocations-drill-down
   
   Nightly benchmark shows that `FSTCompiler#init` allocated most of the memory 
during indexing. This is because `FSTCompiler#init` will always allocate 32k 
bytes as we param `bytesPageBits` default to 15. I counted the usage of 
BytesStore (`getPosition()` when `BytesStore#finish` called) during the 
wikimediumall indexing, and the result shows that 99% FST won't even use more 
than 1k bytes.
   
   ```
   BytesStore#finish called: 100 times
   
   min: 1
   mid: 16
   avg: 64.555987
   pct75: 28
   pct90: 57
   pct99: 525
   pct999: 4957
   pct: 29124
   max: 631700
   ```
   
   This PR proposes to reduce the block size of `FST` in 
`Lucene90BlockTreeTermsWriter`.
   
   closes https://github.com/apache/lucene/issues/12598
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] cpoerschke commented on a diff in pull request #12380: Add a post-collection hook to LeafCollector.

2023-09-28 Thread via GitHub



cpoerschke commented on code in PR #12380:
URL: https://github.com/apache/lucene/pull/12380#discussion_r1340380526


##
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/SuggestIndexSearcher.java:
##
@@ -67,14 +68,16 @@ public void suggest(CompletionQuery query, 
TopSuggestDocsCollector collector) th
 for (LeafReaderContext context : getIndexReader().leaves()) {
   BulkScorer scorer = weight.bulkScorer(context);
   if (scorer != null) {
+LeafCollector leafCollector = collector.getLeafCollector(context);
 try {

Review Comment:
   Comparing this to the 
https://github.com/apache/lucene/blob/6d764c3397d00f93bd4273bd8d1c9e51d6e104e6/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L710-L740
 code I wonder if the `getLeafCollector` call should move inside the `try` 
block here too?
   
   ```
   final LeafCollector leafCollector;
   try {
   leafCollector = collector.getLeafCollector(context);
   ...
   } catch  (CollectionTerminatedException e) {
 ...
   }
   if (leafCollector != null) leafCollector.finish();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] cpoerschke opened a new pull request, #12605: IndexingChain.validateMaxVectorDimension: add missing space wording

2023-09-28 Thread via GitHub



cpoerschke opened a new pull request, #12605:
URL: https://github.com/apache/lucene/pull/12605

   (Keeping as draft for now, to see if similar change might be needed nearby 
too.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-28 Thread via GitHub



benwtrent commented on PR #12590:
URL: https://github.com/apache/lucene/pull/12590#issuecomment-1739838321

   Ok, if we really want the ability to get the final topK for a Lucene index, 
I think a new method should be added (merge results or something). That seems 
like a better extension than overriding the query creation just because it 
already exists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #12380: Add a post-collection hook to LeafCollector.

2023-09-28 Thread via GitHub



jpountz commented on code in PR #12380:
URL: https://github.com/apache/lucene/pull/12380#discussion_r1340582160


##
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/SuggestIndexSearcher.java:
##
@@ -67,14 +68,16 @@ public void suggest(CompletionQuery query, 
TopSuggestDocsCollector collector) th
 for (LeafReaderContext context : getIndexReader().leaves()) {
   BulkScorer scorer = weight.bulkScorer(context);
   if (scorer != null) {
+LeafCollector leafCollector = collector.getLeafCollector(context);
 try {

Review Comment:
   Trying to remember what was on my mind at the time of the change, I think I 
wanted to keep the logic simple, since unlike IndexSearcher which may run any 
Collector, here it may only be a `TopSuggestDocsCollector`, which never throws 
a `CollectionTerminatedException`. I'm ok with moving the `getLeafCollector` 
call under the `try` block though, if you open a PR I'll be happy to approve it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna opened a new pull request, #12606: Create a task executor when executor is not provided

2023-09-28 Thread via GitHub



javanna opened a new pull request, #12606:
URL: https://github.com/apache/lucene/pull/12606

   As we introduce more places where we add concurrency (there are currently 
three) there is a common pattern around checking whether there is an executor 
provided, and then going sequential on the caller thread or parallel relying on 
the executor.
   
   That can be improved by internally creating a TaskExecutor that relies on an 
executor that executes tasks on the caller thread, which ensures that the task 
executor is never null, hence the common conditional is no longer needed, as 
the concurrent path that uses the task executor would be the default and only 
choice for operations that can be parallelized.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna commented on a diff in pull request #12606: Create a task executor when executor is not provided

2023-09-28 Thread via GitHub



javanna commented on code in PR #12606:
URL: https://github.com/apache/lucene/pull/12606#discussion_r1340606929


##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -420,13 +418,12 @@ public int count(Query query) throws IOException {
   }
 
   /**
-   * Returns the leaf slices used for concurrent searching, or null if no 
{@code Executor} was
-   * passed to the constructor.
+   * Returns the leaf slices used for concurrent searching
*
* @lucene.experimental
*/
   public LeafSlice[] getSlices() {
-return (executor == null) ? null : leafSlicesSupplier.get();
+return leafSlicesSupplier.get();

Review Comment:
   I wonder whether this method is still needed, perhaps it's fine but we could 
make it final as a follow-up? It could be confusing otherwise for users to 
figure which of the two methods needs to be overridden between slices and 
getSlices ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub



jpountz commented on issue #12597:
URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739925151

   Can you explain a bit more why you need to store your cache keys off-heap? 
Presumably this isn't because of memory usage since these cache keys are tiny.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] sgup432 commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub



sgup432 commented on issue #12597:
URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739942303

   Yeah this isn't because of memory consumption as such. But more towards 
providing capability to offload cache data to off-heap/disk so that one can 
maintain a larger cache if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub



jpountz commented on issue #12597:
URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739956888

   Would it work to only store the cache values on disk and keep cache keys in 
memory? These cache keys should be a small fraction of the memory that open 
`IndexReader`s take anyway?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna opened a new pull request, #12607: Add missing create github release step to release wizard

2023-09-28 Thread via GitHub



javanna opened a new pull request, #12607:
URL: https://github.com/apache/lucene/pull/12607

   The "create github release" step was missing from the release wizard. We 
have forgotten about it a few times recently. 
   
   While at it, I also expanded the instructions around closing the current 
milestone and moved them after removing opened issues / PRs from the current 
milestone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] quux00 commented on a diff in pull request #12523: TaskExecutor waits for all tasks to complete before returning

2023-09-28 Thread via GitHub



quux00 commented on code in PR #12523:
URL: https://github.com/apache/lucene/pull/12523#discussion_r1340643119


##
lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java:
##
@@ -267,11 +266,130 @@ protected LeafSlice[] slices(List 
leaves) {
 return slices.toArray(new LeafSlice[0]);
   }
 };
-searcher.search(new MatchAllDocsQuery(), 10);
+TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), 10);
+assertTrue(topDocs.totalHits.value > 0);
 if (leaves.size() <= 1) {
   assertEquals(0, numExecutions.get());
 } else {
   assertEquals(leaves.size(), numExecutions.get());
 }
   }
+
+  /**
+   * Tests that when IndexerSearcher runs concurrent searches on multiple 
slices if any Exception is
+   * thrown by one of the slice tasks, it will not return until all tasks have 
completed.
+   *
+   * Without a larger refactoring of the Lucene IndexSearcher and/or 
TaskExecutor there isn't a
+   * clean deterministic way to test this. This test is probabilistic using 
short timeouts in the
+   * tasks that do not throw an Exception.
+   */
+  public void testMultipleSegmentsOnTheExecutorWithException() {
+List leaves = reader.leaves();
+int fixedThreads = leaves.size() == 1 ? 1 : leaves.size() / 2;
+
+ExecutorService fixedThreadPoolExecutor =
+Executors.newFixedThreadPool(fixedThreads, new 
NamedThreadFactory("concurrent-slices"));
+
+IndexSearcher searcher =
+new IndexSearcher(reader, fixedThreadPoolExecutor) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+ArrayList slices = new ArrayList<>();
+for (LeafReaderContext ctx : leaves) {
+  slices.add(new LeafSlice(Arrays.asList(ctx)));
+}
+return slices.toArray(new LeafSlice[0]);
+  }
+};
+
+try {
+  AtomicInteger callsToScorer = new AtomicInteger(0);
+  int numExceptions = leaves.size() == 1 ? 1 : 
RandomizedTest.randomIntBetween(1, 2);
+  MatchAllOrThrowExceptionQuery query =
+  new MatchAllOrThrowExceptionQuery(numExceptions, callsToScorer);
+  RuntimeException exc = expectThrows(RuntimeException.class, () -> 
searcher.search(query, 10));
+  // if the TaskExecutor didn't wait for all tasks to finish, this assert 
would frequently fail
+  assertEquals(leaves.size(), callsToScorer.get());
+  assertThat(
+  exc.getMessage(), 
Matchers.containsString("MatchAllOrThrowExceptionQuery Exception"));
+} finally {
+  TestUtil.shutdownExecutorService(fixedThreadPoolExecutor);
+}
+  }
+
+  private static class MatchAllOrThrowExceptionQuery extends Query {
+
+private final AtomicInteger numExceptionsToThrow;
+private final Query delegate;
+private final AtomicInteger callsToScorer;
+
+/**
+ * Throws an Exception out of the {@code scorer} method the first {@code 
numExceptions} times it
+ * is called. Otherwise, it delegates all calls to the MatchAllDocsQuery.
+ *
+ * @param numExceptions number of exceptions to throw from scorer method
+ * @param callsToScorer where to record the number of times the {@code 
scorer} method has been
+ * called
+ */
+public MatchAllOrThrowExceptionQuery(int numExceptions, AtomicInteger 
callsToScorer) {
+  this.numExceptionsToThrow = new AtomicInteger(numExceptions);
+  this.callsToScorer = callsToScorer;
+  this.delegate = new MatchAllDocsQuery();
+}
+
+@Override
+public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, 
float boost)
+throws IOException {
+  Weight matchAllWeight = delegate.createWeight(searcher, scoreMode, 
boost);
+
+  return new Weight(delegate) {
+@Override
+public boolean isCacheable(LeafReaderContext ctx) {
+  return matchAllWeight.isCacheable(ctx);
+}
+
+@Override
+public Explanation explain(LeafReaderContext context, int doc) throws 
IOException {
+  return matchAllWeight.explain(context, doc);
+}
+
+@Override
+public Scorer scorer(LeafReaderContext context) throws IOException {
+  if (numExceptionsToThrow.getAndDecrement() > 0) {
+callsToScorer.getAndIncrement();
+throw new RuntimeException("MatchAllOrThrowExceptionQuery 
Exception");
+  } else {
+// A small sleep before incrementing the callsToScorer counter 
allows
+// the task with the Exception to be thrown and if 
TaskExecutor.invokeAll
+// does not wait until all tasks have finished, then the 
callsToScorer
+// counter will not match the total number of tasks (or rather 
usually will
+// not match, since there is a race condition that makes it 
probabilistic).
+RandomizedTest.sleep(25);

Review Comment:
   I removed the sleep and added a CountDownLatch that acts to wait until all 
excepti

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

2023-09-28 Thread via GitHub



kaivalnp commented on PR #12590:
URL: https://github.com/apache/lucene/pull/12590#issuecomment-1739975925

   Thanks, makes sense to me! Added an explicit method to merge per-segment 
results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] sgup432 commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub



sgup432 commented on issue #12597:
URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739977503

   We can, that's more of a implementation choice and should be kept open. 
Apart from that, OpenSearch also uses RequestCache which uses a composite key 
and CacheKey being one of them. Might be tricky to store these keys in memory 
beyond a limit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller opened a new issue, #12608: Should DrillSidewaysQuery BulkScorer leverage scoreSuppliers of the base query and dimensions?

2023-09-28 Thread via GitHub



gsmiller opened a new issue, #12608:
URL: https://github.com/apache/lucene/issues/12608

   ### Description
   
   When DrillSidewaysQuery creates a BulkScorer, it directly calls `#scorer` on 
the base query and each drill-down dim. This means the drill-down dims are not 
able to optimize based on leadCost, even though there essentially _is_ a lead 
clause (the base query).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on issue #12597: Make IndexReader.CacheKey serializable

2023-09-28 Thread via GitHub



jpountz commented on issue #12597:
URL: https://github.com/apache/lucene/issues/12597#issuecomment-1739991979

   Got it.
   
   I'm a bit torn on this change, on the one hand it would be harmless as you 
pointed out, on the other hand I could see it being a bit of a rabbit hole with 
future features requests coming up about also making other things that could be 
used as cache keys serializable, e.g. queries. I'd be interested in getting 
more opinions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] sgup432 commented on issue #12597: Make IndexReader.CacheKey serializable

[GitHub] [lucene] gf2121 commented on issue #12598: FST#Compiler allocates too much memory

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

[GitHub] [lucene] javanna opened a new pull request, #12603: Simplify TaskExecutor API

[GitHub] [lucene] gf2121 commented on issue #12598: FST#Compiler allocates too much memory

[GitHub] [lucene] gf2121 opened a new pull request, #12604: Reduce FST block size for BlockTreeTermsWriter

[GitHub] [lucene] cpoerschke commented on a diff in pull request #12380: Add a post-collection hook to LeafCollector.

[GitHub] [lucene] cpoerschke opened a new pull request, #12605: IndexingChain.validateMaxVectorDimension: add missing space wording

[GitHub] [lucene] benwtrent commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

[GitHub] [lucene] jpountz commented on a diff in pull request #12380: Add a post-collection hook to LeafCollector.

[GitHub] [lucene] javanna opened a new pull request, #12606: Create a task executor when executor is not provided

[GitHub] [lucene] javanna commented on a diff in pull request #12606: Create a task executor when executor is not provided

[GitHub] [lucene] jpountz commented on issue #12597: Make IndexReader.CacheKey serializable

[GitHub] [lucene] sgup432 commented on issue #12597: Make IndexReader.CacheKey serializable

[GitHub] [lucene] jpountz commented on issue #12597: Make IndexReader.CacheKey serializable

[GitHub] [lucene] javanna opened a new pull request, #12607: Add missing create github release step to release wizard

[GitHub] [lucene] quux00 commented on a diff in pull request #12523: TaskExecutor waits for all tasks to complete before returning

[GitHub] [lucene] kaivalnp commented on pull request #12590: Allow implementers of AbstractKnnVectorQuery to access final topK results

[GitHub] [lucene] sgup432 commented on issue #12597: Make IndexReader.CacheKey serializable

[GitHub] [lucene] gsmiller opened a new issue, #12608: Should DrillSidewaysQuery BulkScorer leverage scoreSuppliers of the base query and dimensions?

[GitHub] [lucene] jpountz commented on issue #12597: Make IndexReader.CacheKey serializable

21 matches

Site Navigation

Mail list logo

Footer information