[GitHub] [lucene] zhaih commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-12 Thread GitBox


zhaih commented on PR #11840:
URL: https://github.com/apache/lucene/pull/11840#issuecomment-1275696638

   Hi @jpountz thank for taking a look!
   > We already have one class that wraps an IndexReader and an Executor: 
IndexSearcher. Should this new rewrite method take an IndexSearcher instead of 
an IndexReader and an Executor?
   
   I think it'd be a little weird to know that query rewriting requires an 
IndexSearcher from a user point of view? Also for people who're not using 
IndexSearcher provided by lucene (like we're in LinkedIn) it will be a bit 
inconvenience to create an IndexSearcher just for query rewriting?
   
   > (and should it replace the existing rewrite method instead of just adding 
another one?)
   
   Yes I think replace is better, I previously wanted to make it into 9.x so 
don't want to change the API, are we able to make it in 9.x even if with the 
API change?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] donnerpeter commented on issue #11459: Remove uses of wall-clock time in codebase [LUCENE-10423]

2022-10-12 Thread GitBox


donnerpeter commented on issue #11459:
URL: https://github.com/apache/lucene/issues/11459#issuecomment-1275869750

   I've found a mistake in conversion made during these changes (missing 
parentheses and wrong zero count). I've fixed it where I need it 
(https://github.com/apache/lucene/commit/ab50fe640be1499c2d9875e74be6efe7326688a7),
 but other places might also need fixing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-12 Thread GitBox


jpountz commented on PR #11840:
URL: https://github.com/apache/lucene/pull/11840#issuecomment-1276029374

   I don't think it'd be weird to require an `IndexSearcher`, `Query#rewrite` 
is essentially a way to improve caching and keep `Query#createWeight` simple. 
Given that `Query#createWeight` already creates an `IndexSearcher`, I'd assume 
that you already create an `IndexSearcher` even though you might not be using 
its `search` methods?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new issue, #11842: TestLongBitSet.testHugeCapacity OOM

2022-10-12 Thread GitBox


rmuir opened a new issue, #11842:
URL: https://github.com/apache/lucene/issues/11842

   ### Description
   
   Failed on 9.x nightly build here: 
https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.x/338/console
   
   ```
   org.apache.lucene.util.TestLongBitSet > testHugeCapacity FAILED
   java.lang.OutOfMemoryError: Java heap space
   at 
__randomizedtesting.SeedInfo.seed([3466B8953374BF0A:4E26F26EDA84990E]:0)
   at org.apache.lucene.util.ArrayUtil.growExact(ArrayUtil.java:364)
   at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:376)
   at 
org.apache.lucene.util.LongBitSet.ensureCapacity(LongBitSet.java:53)
   at 
org.apache.lucene.util.TestLongBitSet.testHugeCapacity(TestLongBitSet.java:362)
   ```
   
   
   ### Gradle command to reproduce
   
   ```
   2> NOTE: reproduce with: gradlew test --tests 
TestLongBitSet.testHugeCapacity -Dtests.seed=3466B8953374BF0A 
-Dtests.multiplier=2 -Dtests.nightly=true 
-Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.x/test-data/enwiki.random.lines.txt
 -Dtests.locale=prg-001 -Dtests.timezone=America/Inuvik -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] benwtrent opened a new pull request, #11843: Remove cancellation check on every vector

2022-10-12 Thread GitBox


benwtrent opened a new pull request, #11843:
URL: https://github.com/apache/lucene/pull/11843

   PR: https://github.com/apache/lucene/pull/833 helpfully introduced query 
cancellation checks for KNN vector queries. 
   
   However, checking for cancellation on every vector read has a negative 
impact on performance. 
   
   This change proposes that we no longer check on every vector. 
   
   This performance hit was noticed first in Elasticsearch benching nightlies:
   
- related PRs: [Initial 
Addition](https://github.com/elastic/elasticsearch/pull/90612), 
[Fix](https://github.com/elastic/elasticsearch/pull/90804)
- Performance numbers: 
https://elasticsearch-benchmarks.elastic.co/#tracks/dense_vector/nightly/default/90d
 (notice `nightly-dense_vector-add-4g-1node-script-score-query-latency`)
   
   Lucene benches indicate no dramatic change in VectorSearch around May (when 
the change was merged...I may be missing where to look). 
   
   It is common to iterate many vectors (especially since there is currently no 
early exit mechanism available). Other exitable iterators don't check on every 
value read and usually sample (see `ExitableTermsEnum` as prior art).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] mikemccand commented on pull request #11780: GH#11601: Add ability to compute reader states after refresh

2022-10-12 Thread GitBox


mikemccand commented on PR #11780:
URL: https://github.com/apache/lucene/pull/11780#issuecomment-1276316505

   > 3\. Allow the user to update the ordinal maps in the reader states they 
already have without requiring them to completely recreate the reader states. 
I’m not sure how much this accomplishes. The Javadoc suggests that the bulk of 
the work in instantiating a SortedSetDocValuesReaderState is building the 
ordinal map. In that case, updating the ordinal maps instead of recreating the 
reader states doesn’t have much benefit.
   
   I think this is not a great option (in-place update) because it'd break 
Lucene's point-in-time semantics.  It would be nice to be able to create a new 
`OrdinalMap` for a refreshed reader by passing in the old `OrdinalMap` and 
being more "incremental" in how the new one is created (building on, but not 
altering, the old one)?  But that is really a separate issue and is quite 
complex I think :)
   
   > 1. Do nothing. The user can already instantiate new reader states after a 
refresh if they want. No new API is required.
   
   +1 for this option -- I don't see why this is a problem for 
users/applications.  No need to do this atomically under the refresh lock.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on issue #11842: TestLongBitSet.testHugeCapacity OOM

2022-10-12 Thread GitBox


rmuir commented on issue #11842:
URL: https://github.com/apache/lucene/issues/11842#issuecomment-1276409926

   The test allocates a bitset exceeding length of 2^31, so we can expect it to 
need hundreds of megabytes of heap space. 
   
   I think rather than mark the test `@Nightly` we should change it to 
`@Monster`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #11844: Mark TestLongBitSet.testHugeCapacity @Monster as it requires a lot of memory

2022-10-12 Thread GitBox


rmuir opened a new pull request, #11844:
URL: https://github.com/apache/lucene/pull/11844

   This test case needs hundreds of MB of heap, and causes OOMs in nightly 
builds. Let's mark it `@Monster` appropriately.
   
   Closes #11842


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new issue, #11845: WrapperDownloader should retry on Layer3/Layer4 network errors

2022-10-12 Thread GitBox


rmuir opened a new issue, #11845:
URL: https://github.com/apache/lucene/issues/11845

   ### Description
   
   WrapperDownloader became more stable in CI after #11766 added Layer7 retries 
for HTTP 5xx.
   
   But I think we should also retry on lower-level network failures (e.g. DNS 
failure, connect timeout, read timeout, etc). I already encountered this from 
github actions:
   ```
   Run ./gradlew localSettings --max-workers 2
 ./gradlew localSettings --max-workers 2
 shell: /usr/bin/bash -e {0}
 env:
   JAVA_HOME: /opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/17.0.4-1/x64
   Downloading gradle-wrapper.jar from 
https://raw.githubusercontent.com/gradle/gradle/v7.3.3/gradle/wrapper/gradle-wrapper.jar
   ERROR: Could not download gradle-wrapper.jar (ConnectException: Connection 
timed out).
   Error: Process completed with exit code 3.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir opened a new pull request, #11846: WrapperDownloader: add retries for network blips around connect(), too

2022-10-12 Thread GitBox


rmuir opened a new pull request, #11846:
URL: https://github.com/apache/lucene/pull/11846

   Add retries for common issues such as connect timeout, etc.
   
   This won't solve the problem of read-timeouts happening around the actual 
transferTo, but it is an easy incremental improvement.
   
   Quickly tested the retries work by taking away gradle's network:
   ```
   $ unshare --user --net ./gradlew test
   Downloading gradle-wrapper.jar from 
https://raw.githubusercontent.com/gradle/gradle/v7.3.3/gradle/wrapper/gradle-wrapper.jar
   Error connecting to server: java.net.SocketException: Network is 
unreachable, will retry in 30 seconds.
   Error connecting to server: java.net.SocketException: Network is 
unreachable, will retry in 30 seconds.
   Error connecting to server: java.net.SocketException: Network is 
unreachable, will retry in 30 seconds.
   ERROR: Could not download gradle-wrapper.jar (SocketException: Network is 
unreachable).
   ```
   
   Closes #11845 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] dweiss commented on pull request #11846: WrapperDownloader: add retries for network blips around connect(), too

2022-10-12 Thread GitBox


dweiss commented on PR #11846:
URL: https://github.com/apache/lucene/pull/11846#issuecomment-1276505202

   LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jtibshirani commented on pull request #11843: Remove cancellation check on every vector

2022-10-12 Thread GitBox


jtibshirani commented on PR #11843:
URL: https://github.com/apache/lucene/pull/11843#issuecomment-1276520789

   To give some context, Elasticsearch exposes a query type that performs a kNN 
exact scan. It iterates through all the `VectorValues` matching a query, and 
computes the similarity. Since `ExitableDirectoryReader` checked for 
cancellation on every vector access, it added significant overhead to these 
queries.
   
   This is also relevant to Lucene (even if Elasticsearch didn't have this 
query type). In `KnnVectorQuery`, we fall back to an exact scan if kNN with 
filtering is too expensive. I'm guessing we didn't see this in the nightlies 
because we don't test kNN with filtering plus query cancellation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] zhaih commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-12 Thread GitBox


zhaih commented on PR #11840:
URL: https://github.com/apache/lucene/pull/11840#issuecomment-1277038263

   @jpountz You're right, our case is a bit complex since currently we're not 
even using Lucene's Query (but we're planning to in the future!) so I totally 
forgot the createWeight also takes an IndexSearcher.
   Yeah now it makes sense to me to change that API to use IndexSearcher as 
well, I'll try to get one out these days. Thank you for the discussion!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] jpountz commented on a diff in pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite

2022-10-12 Thread GitBox


jpountz commented on code in PR #11840:
URL: https://github.com/apache/lucene/pull/11840#discussion_r994219709


##
lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java:
##
@@ -18,16 +18,7 @@
 
 import java.io.IOException;
 import java.util.Objects;
-import org.apache.lucene.index.DocValues;
-import org.apache.lucene.index.DocValuesType;
-import org.apache.lucene.index.FieldInfo;
-import org.apache.lucene.index.FieldInfos;
-import org.apache.lucene.index.IndexOptions;
-import org.apache.lucene.index.IndexReader;
-import org.apache.lucene.index.LeafReader;
-import org.apache.lucene.index.LeafReaderContext;
-import org.apache.lucene.index.PointValues;
-import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.*;

Review Comment:
   Can you undo the star import?



##
lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java:
##
@@ -86,12 +86,12 @@ public KnnVectorQuery(String field, float[] target, int k, 
Query filter) {
   }
 
   @Override
-  public Query rewrite(IndexReader reader) throws IOException {
+  public Query rewrite(IndexSearcher indexSearcher) throws IOException {
+IndexReader reader = indexSearcher.getIndexReader();
 TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()];
 
 Weight filterWeight = null;
 if (filter != null) {
-  IndexSearcher indexSearcher = new IndexSearcher(reader);

Review Comment:
   Being able to undo this is good. :+1:



##
lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java:
##
@@ -28,16 +28,7 @@
 import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.Term;
 import org.apache.lucene.queries.function.FunctionScoreQuery;
-import org.apache.lucene.search.BooleanClause;
-import org.apache.lucene.search.BooleanQuery;
-import org.apache.lucene.search.BoostQuery;
-import org.apache.lucene.search.ConstantScoreQuery;
-import org.apache.lucene.search.DisjunctionMaxQuery;
-import org.apache.lucene.search.MultiTermQuery;
-import org.apache.lucene.search.PhraseQuery;
-import org.apache.lucene.search.Query;
-import org.apache.lucene.search.SynonymQuery;
-import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.*;

Review Comment:
   undo?



##
lucene/highlighter/src/test/org/apache/lucene/search/highlight/custom/TestHighlightCustomQuery.java:
##
@@ -21,12 +21,8 @@
 import java.util.Map;
 import java.util.Objects;
 import org.apache.lucene.analysis.TokenStream;
-import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.Term;
-import org.apache.lucene.search.BoostQuery;
-import org.apache.lucene.search.Query;
-import org.apache.lucene.search.QueryVisitor;
-import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.*;

Review Comment:
   undo?



##
lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/FuzzyLikeThisQuery.java:
##
@@ -32,15 +32,7 @@
 import org.apache.lucene.index.TermStates;
 import org.apache.lucene.index.Terms;
 import org.apache.lucene.index.TermsEnum;
-import org.apache.lucene.search.BooleanClause;
-import org.apache.lucene.search.BooleanQuery;
-import org.apache.lucene.search.BoostAttribute;
-import org.apache.lucene.search.BoostQuery;
-import org.apache.lucene.search.ConstantScoreQuery;
-import org.apache.lucene.search.FuzzyTermsEnum;
-import org.apache.lucene.search.Query;
-import org.apache.lucene.search.QueryVisitor;
-import org.apache.lucene.search.TermQuery;
+import org.apache.lucene.search.*;

Review Comment:
   undo?



##
lucene/facet/src/java/org/apache/lucene/facet/DrillDownQuery.java:
##
@@ -24,15 +24,9 @@
 import java.util.Map;
 import java.util.Objects;
 import java.util.Set;
-import org.apache.lucene.index.IndexReader;
 import org.apache.lucene.index.Term;
+import org.apache.lucene.search.*;

Review Comment:
   undo the start import?



##
lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThisQuery.java:
##
@@ -22,11 +22,7 @@
 import java.util.Objects;
 import java.util.Set;
 import org.apache.lucene.analysis.Analyzer;
-import org.apache.lucene.index.IndexReader;
-import org.apache.lucene.search.BooleanClause;
-import org.apache.lucene.search.BooleanQuery;
-import org.apache.lucene.search.Query;
-import org.apache.lucene.search.QueryVisitor;
+import org.apache.lucene.search.*;

Review Comment:
   undo?



##
lucene/core/src/java/org/apache/lucene/document/FeatureQuery.java:
##
@@ -50,12 +49,12 @@ final class FeatureQuery extends Query {
   }
 
   @Override
-  public Query rewrite(IndexReader reader) throws IOException {
-FeatureFunction rewritten = function.rewrite(reader);
+  public Query rewrite(IndexSearcher indexSearcher) throws IOException {
+FeatureFunction rewritten = 
function.rewrite(indexSearcher.getIndexReader());

Review Comment:
   maybe change the signature of `FeatureFunction#rewrite` to take an 
`IndexSearcher` as