[GitHub] [lucene] zhaih commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite
zhaih commented on PR #11840: URL: https://github.com/apache/lucene/pull/11840#issuecomment-1275696638 Hi @jpountz thank for taking a look! > We already have one class that wraps an IndexReader and an Executor: IndexSearcher. Should this new rewrite method take an IndexSearcher instead of an IndexReader and an Executor? I think it'd be a little weird to know that query rewriting requires an IndexSearcher from a user point of view? Also for people who're not using IndexSearcher provided by lucene (like we're in LinkedIn) it will be a bit inconvenience to create an IndexSearcher just for query rewriting? > (and should it replace the existing rewrite method instead of just adding another one?) Yes I think replace is better, I previously wanted to make it into 9.x so don't want to change the API, are we able to make it in 9.x even if with the API change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] donnerpeter commented on issue #11459: Remove uses of wall-clock time in codebase [LUCENE-10423]
donnerpeter commented on issue #11459: URL: https://github.com/apache/lucene/issues/11459#issuecomment-1275869750 I've found a mistake in conversion made during these changes (missing parentheses and wrong zero count). I've fixed it where I need it (https://github.com/apache/lucene/commit/ab50fe640be1499c2d9875e74be6efe7326688a7), but other places might also need fixing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite
jpountz commented on PR #11840: URL: https://github.com/apache/lucene/pull/11840#issuecomment-1276029374 I don't think it'd be weird to require an `IndexSearcher`, `Query#rewrite` is essentially a way to improve caching and keep `Query#createWeight` simple. Given that `Query#createWeight` already creates an `IndexSearcher`, I'd assume that you already create an `IndexSearcher` even though you might not be using its `search` methods? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new issue, #11842: TestLongBitSet.testHugeCapacity OOM
rmuir opened a new issue, #11842: URL: https://github.com/apache/lucene/issues/11842 ### Description Failed on 9.x nightly build here: https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.x/338/console ``` org.apache.lucene.util.TestLongBitSet > testHugeCapacity FAILED java.lang.OutOfMemoryError: Java heap space at __randomizedtesting.SeedInfo.seed([3466B8953374BF0A:4E26F26EDA84990E]:0) at org.apache.lucene.util.ArrayUtil.growExact(ArrayUtil.java:364) at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:376) at org.apache.lucene.util.LongBitSet.ensureCapacity(LongBitSet.java:53) at org.apache.lucene.util.TestLongBitSet.testHugeCapacity(TestLongBitSet.java:362) ``` ### Gradle command to reproduce ``` 2> NOTE: reproduce with: gradlew test --tests TestLongBitSet.testHugeCapacity -Dtests.seed=3466B8953374BF0A -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.x/test-data/enwiki.random.lines.txt -Dtests.locale=prg-001 -Dtests.timezone=America/Inuvik -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent opened a new pull request, #11843: Remove cancellation check on every vector
benwtrent opened a new pull request, #11843: URL: https://github.com/apache/lucene/pull/11843 PR: https://github.com/apache/lucene/pull/833 helpfully introduced query cancellation checks for KNN vector queries. However, checking for cancellation on every vector read has a negative impact on performance. This change proposes that we no longer check on every vector. This performance hit was noticed first in Elasticsearch benching nightlies: - related PRs: [Initial Addition](https://github.com/elastic/elasticsearch/pull/90612), [Fix](https://github.com/elastic/elasticsearch/pull/90804) - Performance numbers: https://elasticsearch-benchmarks.elastic.co/#tracks/dense_vector/nightly/default/90d (notice `nightly-dense_vector-add-4g-1node-script-score-query-latency`) Lucene benches indicate no dramatic change in VectorSearch around May (when the change was merged...I may be missing where to look). It is common to iterate many vectors (especially since there is currently no early exit mechanism available). Other exitable iterators don't check on every value read and usually sample (see `ExitableTermsEnum` as prior art). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #11780: GH#11601: Add ability to compute reader states after refresh
mikemccand commented on PR #11780: URL: https://github.com/apache/lucene/pull/11780#issuecomment-1276316505 > 3\. Allow the user to update the ordinal maps in the reader states they already have without requiring them to completely recreate the reader states. I’m not sure how much this accomplishes. The Javadoc suggests that the bulk of the work in instantiating a SortedSetDocValuesReaderState is building the ordinal map. In that case, updating the ordinal maps instead of recreating the reader states doesn’t have much benefit. I think this is not a great option (in-place update) because it'd break Lucene's point-in-time semantics. It would be nice to be able to create a new `OrdinalMap` for a refreshed reader by passing in the old `OrdinalMap` and being more "incremental" in how the new one is created (building on, but not altering, the old one)? But that is really a separate issue and is quite complex I think :) > 1. Do nothing. The user can already instantiate new reader states after a refresh if they want. No new API is required. +1 for this option -- I don't see why this is a problem for users/applications. No need to do this atomically under the refresh lock. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #11842: TestLongBitSet.testHugeCapacity OOM
rmuir commented on issue #11842: URL: https://github.com/apache/lucene/issues/11842#issuecomment-1276409926 The test allocates a bitset exceeding length of 2^31, so we can expect it to need hundreds of megabytes of heap space. I think rather than mark the test `@Nightly` we should change it to `@Monster`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #11844: Mark TestLongBitSet.testHugeCapacity @Monster as it requires a lot of memory
rmuir opened a new pull request, #11844: URL: https://github.com/apache/lucene/pull/11844 This test case needs hundreds of MB of heap, and causes OOMs in nightly builds. Let's mark it `@Monster` appropriately. Closes #11842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new issue, #11845: WrapperDownloader should retry on Layer3/Layer4 network errors
rmuir opened a new issue, #11845: URL: https://github.com/apache/lucene/issues/11845 ### Description WrapperDownloader became more stable in CI after #11766 added Layer7 retries for HTTP 5xx. But I think we should also retry on lower-level network failures (e.g. DNS failure, connect timeout, read timeout, etc). I already encountered this from github actions: ``` Run ./gradlew localSettings --max-workers 2 ./gradlew localSettings --max-workers 2 shell: /usr/bin/bash -e {0} env: JAVA_HOME: /opt/hostedtoolcache/Java_Temurin-Hotspot_jdk/17.0.4-1/x64 Downloading gradle-wrapper.jar from https://raw.githubusercontent.com/gradle/gradle/v7.3.3/gradle/wrapper/gradle-wrapper.jar ERROR: Could not download gradle-wrapper.jar (ConnectException: Connection timed out). Error: Process completed with exit code 3. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir opened a new pull request, #11846: WrapperDownloader: add retries for network blips around connect(), too
rmuir opened a new pull request, #11846: URL: https://github.com/apache/lucene/pull/11846 Add retries for common issues such as connect timeout, etc. This won't solve the problem of read-timeouts happening around the actual transferTo, but it is an easy incremental improvement. Quickly tested the retries work by taking away gradle's network: ``` $ unshare --user --net ./gradlew test Downloading gradle-wrapper.jar from https://raw.githubusercontent.com/gradle/gradle/v7.3.3/gradle/wrapper/gradle-wrapper.jar Error connecting to server: java.net.SocketException: Network is unreachable, will retry in 30 seconds. Error connecting to server: java.net.SocketException: Network is unreachable, will retry in 30 seconds. Error connecting to server: java.net.SocketException: Network is unreachable, will retry in 30 seconds. ERROR: Could not download gradle-wrapper.jar (SocketException: Network is unreachable). ``` Closes #11845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #11846: WrapperDownloader: add retries for network blips around connect(), too
dweiss commented on PR #11846: URL: https://github.com/apache/lucene/pull/11846#issuecomment-1276505202 LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jtibshirani commented on pull request #11843: Remove cancellation check on every vector
jtibshirani commented on PR #11843: URL: https://github.com/apache/lucene/pull/11843#issuecomment-1276520789 To give some context, Elasticsearch exposes a query type that performs a kNN exact scan. It iterates through all the `VectorValues` matching a query, and computes the similarity. Since `ExitableDirectoryReader` checked for cancellation on every vector access, it added significant overhead to these queries. This is also relevant to Lucene (even if Elasticsearch didn't have this query type). In `KnnVectorQuery`, we fall back to an exact scan if kNN with filtering is too expensive. I'm guessing we didn't see this in the nightlies because we don't test kNN with filtering plus query cancellation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] zhaih commented on pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite
zhaih commented on PR #11840: URL: https://github.com/apache/lucene/pull/11840#issuecomment-1277038263 @jpountz You're right, our case is a bit complex since currently we're not even using Lucene's Query (but we're planning to in the future!) so I totally forgot the createWeight also takes an IndexSearcher. Yeah now it makes sense to me to change that API to use IndexSearcher as well, I'll try to get one out these days. Thank you for the discussion! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #11840: GITHUB-11838 Add api to allow concurrent query rewrite
jpountz commented on code in PR #11840: URL: https://github.com/apache/lucene/pull/11840#discussion_r994219709 ## lucene/core/src/java/org/apache/lucene/search/FieldExistsQuery.java: ## @@ -18,16 +18,7 @@ import java.io.IOException; import java.util.Objects; -import org.apache.lucene.index.DocValues; -import org.apache.lucene.index.DocValuesType; -import org.apache.lucene.index.FieldInfo; -import org.apache.lucene.index.FieldInfos; -import org.apache.lucene.index.IndexOptions; -import org.apache.lucene.index.IndexReader; -import org.apache.lucene.index.LeafReader; -import org.apache.lucene.index.LeafReaderContext; -import org.apache.lucene.index.PointValues; -import org.apache.lucene.index.Terms; +import org.apache.lucene.index.*; Review Comment: Can you undo the star import? ## lucene/core/src/java/org/apache/lucene/search/KnnVectorQuery.java: ## @@ -86,12 +86,12 @@ public KnnVectorQuery(String field, float[] target, int k, Query filter) { } @Override - public Query rewrite(IndexReader reader) throws IOException { + public Query rewrite(IndexSearcher indexSearcher) throws IOException { +IndexReader reader = indexSearcher.getIndexReader(); TopDocs[] perLeafResults = new TopDocs[reader.leaves().size()]; Weight filterWeight = null; if (filter != null) { - IndexSearcher indexSearcher = new IndexSearcher(reader); Review Comment: Being able to undo this is good. :+1: ## lucene/highlighter/src/java/org/apache/lucene/search/vectorhighlight/FieldQuery.java: ## @@ -28,16 +28,7 @@ import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import org.apache.lucene.queries.function.FunctionScoreQuery; -import org.apache.lucene.search.BooleanClause; -import org.apache.lucene.search.BooleanQuery; -import org.apache.lucene.search.BoostQuery; -import org.apache.lucene.search.ConstantScoreQuery; -import org.apache.lucene.search.DisjunctionMaxQuery; -import org.apache.lucene.search.MultiTermQuery; -import org.apache.lucene.search.PhraseQuery; -import org.apache.lucene.search.Query; -import org.apache.lucene.search.SynonymQuery; -import org.apache.lucene.search.TermQuery; +import org.apache.lucene.search.*; Review Comment: undo? ## lucene/highlighter/src/test/org/apache/lucene/search/highlight/custom/TestHighlightCustomQuery.java: ## @@ -21,12 +21,8 @@ import java.util.Map; import java.util.Objects; import org.apache.lucene.analysis.TokenStream; -import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; -import org.apache.lucene.search.BoostQuery; -import org.apache.lucene.search.Query; -import org.apache.lucene.search.QueryVisitor; -import org.apache.lucene.search.TermQuery; +import org.apache.lucene.search.*; Review Comment: undo? ## lucene/sandbox/src/java/org/apache/lucene/sandbox/queries/FuzzyLikeThisQuery.java: ## @@ -32,15 +32,7 @@ import org.apache.lucene.index.TermStates; import org.apache.lucene.index.Terms; import org.apache.lucene.index.TermsEnum; -import org.apache.lucene.search.BooleanClause; -import org.apache.lucene.search.BooleanQuery; -import org.apache.lucene.search.BoostAttribute; -import org.apache.lucene.search.BoostQuery; -import org.apache.lucene.search.ConstantScoreQuery; -import org.apache.lucene.search.FuzzyTermsEnum; -import org.apache.lucene.search.Query; -import org.apache.lucene.search.QueryVisitor; -import org.apache.lucene.search.TermQuery; +import org.apache.lucene.search.*; Review Comment: undo? ## lucene/facet/src/java/org/apache/lucene/facet/DrillDownQuery.java: ## @@ -24,15 +24,9 @@ import java.util.Map; import java.util.Objects; import java.util.Set; -import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; +import org.apache.lucene.search.*; Review Comment: undo the start import? ## lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThisQuery.java: ## @@ -22,11 +22,7 @@ import java.util.Objects; import java.util.Set; import org.apache.lucene.analysis.Analyzer; -import org.apache.lucene.index.IndexReader; -import org.apache.lucene.search.BooleanClause; -import org.apache.lucene.search.BooleanQuery; -import org.apache.lucene.search.Query; -import org.apache.lucene.search.QueryVisitor; +import org.apache.lucene.search.*; Review Comment: undo? ## lucene/core/src/java/org/apache/lucene/document/FeatureQuery.java: ## @@ -50,12 +49,12 @@ final class FeatureQuery extends Query { } @Override - public Query rewrite(IndexReader reader) throws IOException { -FeatureFunction rewritten = function.rewrite(reader); + public Query rewrite(IndexSearcher indexSearcher) throws IOException { +FeatureFunction rewritten = function.rewrite(indexSearcher.getIndexReader()); Review Comment: maybe change the signature of `FeatureFunction#rewrite` to take an `IndexSearcher` as