Re: [PR] Forbidden Thread.sleep API [lucene]
shubhamvishu commented on PR #13001: URL: https://github.com/apache/lucene/pull/13001#issuecomment-1907646476 @uschindler @mikemccand I think we could merge this now unless I'm missing something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
stefanvodita commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1907818280 Maybe we can merge this fix as-is and continue the conversation about where test polygon generation should produce valid polygons in #12596. It's not clear to me now if there is anything to fix in #12596, but I think we agree that the fix in this PR should go ahead. @heemin32 - do you want to write a [CHANGES](https://github.com/apache/lucene/blob/main/lucene/CHANGES.txt) entry for Lucene 9.10? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] Improve AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight ScorerSupplier cost estimation [lucene]
rquesada-tibco opened a new issue, #13029: URL: https://github.com/apache/lucene/issues/13029 ### Description We recently discovered a performance degradation on our project when going from Lucene 9.4 to 9.9. The cause seems to be a side effect of https://github.com/apache/lucene/commit/c6667e709f610669b57a6a1a6ab1cc80f4f0ebaf and https://github.com/apache/lucene/commit/3809106602a9675f4fd217b1090af4505d4ec2a7 The situation is as follows: we have a `WildcardQuery` and a `TermInSetQuery` which are and-combined (within a `BooleanQuery`). This structure gets executed repeatedly, kind of like a nested loop where the `WildcardQuery` remains the same, but the `TermInSetQuery` keeps changes its terms. In the old version, this was fast because the `WildcardQuery` was cached within the `LRUQueryCache`. However in the new version this is no longer the case, so the execution time of our scenario has increased. Why our `WildcardQuery` is not cached any more? It boils down to [this line](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java#L771) in `LRUQueryCache`, where the cache operation won't happen if the cost estimation is too high: ``` final ScorerSupplier supplier = in.scorerSupplier(context); ... final long cost = supplier.cost(); ... // skip cache operation which would slow query down too much if (cost / skipCacheFactor > leadCost) { ... ``` Before the upgrade to 9.9, that cost was provided by a `ConstantScoreWeight` returned by the old `MultiTermQueryConstantScoreWrapper` (which was returned by the default `RewriteMethod`), which in the end was just based on the "default" `Weight#scoreSupplier` [implementation](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/Weight.java#L147): basically the cost was the `scorer.iterator().cost();` and in **our case the `WildcardQuery` returns just one document, so cost 1**. After the upgrade, the default `RewriteMethod` has changed and now this cost is provided by `AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight#scorerSupplier` [here](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L263), and for that purpose a private [estimateCost method](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L295) was introduced, which bases the estimation on the [MultiTermQuery#getTermsCount](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/MultiTermQuery.java#L316) value. The problem is that, for our `WildcardQuery` (in fact for any sub-class of `AutomatonQuery`), this value is unknown, i.e. `-1`, so the `estimateCost` method just [return s](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L309) `terms.getSumDocFreq()`, which is clearly an overestimation in our case, so it prevents the caching, and leads to a performance degradation. I understand that I can fix this situation by writing my customized `RewriteMethod`. The question is: could we improve `AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight#scorerSupplier#cost` so that, if the MTQ cannot provide a term count (`getTermsCount() == -1`) then we return `scorer.iterator().cost()` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add support for similarity-based vector searches [lucene]
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1907958716 This feature will ship with Lucene 9.10 I'm not sure when that will be released, though [I see](https://lucene.apache.org/core/corenews.html) \~2-4 months between previous minor versions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix issues with chunked TaxonomyIndexArray [lucene]
stefanvodita commented on code in PR #13028: URL: https://github.com/apache/lucene/pull/13028#discussion_r1464835176 ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java: ## @@ -95,7 +97,8 @@ public TaxonomyIndexArrays(IndexReader reader, TaxonomyIndexArrays copyFrom) thr // NRT reader was obtained, even though nothing was changed. this is not very likely // to happen. int[][] parentArray = allocateChunkedArray(reader.maxDoc(), copyFrom.parents.values.length - 1); -if (parentArray.length > 0) { +assert parentArray.length > 0; +if (parentArray[parentArray.length - 1].length > 0) { Review Comment: In this constructor we can't use `parentArray[0].length > 0` because `parentArray[0]` could be `null` if we had already filled up the first chunk, but I actually think we don't need this if statement at all - what are we guarding against? ## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java: ## @@ -80,7 +81,8 @@ public int length() { public TaxonomyIndexArrays(IndexReader reader) throws IOException { int[][] parentArray = allocateChunkedArray(reader.maxDoc(), 0); -if (parentArray.length > 0) { +assert parentArray.length > 0; +if (parentArray[parentArray.length - 1].length > 0) { Review Comment: You're right that this condition is not correct. I'll revert back to `parentArray[0].length > 0`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Fix test failure for TestTopFieldCollector.testTotalHits [lucene]
easyice opened a new pull request, #13030: URL: https://github.com/apache/lucene/pull/13030 Gradle command to reproduce: ``` ./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestTopFieldCollector.testTotalHits" -Ptests.heapsize=2g -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.seed=334CDB6B7D196089 -Ptests.nightly=true -Ptests.gui=false -Ptests.file.encoding=ISO-8859-1 -Ptests.vectorsize=256 ``` Error message: ``` > java.lang.AssertionError: expected:<2> but was:<3> > at __randomizedtesting.SeedInfo.seed([334CDB6B7D196089:193C89DD7C8E120E]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at org.apache.lucene.search.TestTopFieldCollector.testTotalHits(TestTopFieldCollector.java:195) ``` The `newIndexWriterConfig` will `setMaxBufferedDocs` using a random value of 2-15 or 16-1000, which maybe cause the `AssertionError` in this test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
mikemccand commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1908191816 > Maybe we can merge this fix as-is and continue the conversation about where test polygon generation should produce valid polygons in #12596. I +1. Progress not perfection! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Propagate topLevelScoringClause from QueryProfiler [lucene]
mrkm4ntr opened a new pull request, #13031: URL: https://github.com/apache/lucene/pull/13031 ### Description The topLevelScoringClause is not propagated with QueryProfiler. This causes some optimizations are skipped during profiling. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Rollback the tmp storage of BytesRefHash to -1 after sort [lucene]
ChrisHegarty commented on PR #13014: URL: https://github.com/apache/lucene/pull/13014#issuecomment-1908264542 @gf2121 I've added the 9.9.2 milestone to this PR. Do you agree? If so, is it possible to merge and backport to the `branch_9x` and `branch_9_9` branches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] in BytesRefHash constructor avoid duplicate BytesStartArray.bytesUsed() call [lucene]
cpoerschke opened a new pull request, #13032: URL: https://github.com/apache/lucene/pull/13032 Noticed whilst code reading. No issue or CHANGES.txt entry needed I think. For `main` and `branch_9x` branches only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] join: avoid repeat BytesRefHash.sort() in TermsQuery after TermsIncludingScoreQuery [lucene]
cpoerschke closed pull request #13019: join: avoid repeat BytesRefHash.sort() in TermsQuery after TermsIncludingScoreQuery URL: https://github.com/apache/lucene/pull/13019 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] join: avoid repeat BytesRefHash.sort() in TermsQuery after TermsIncludingScoreQuery [lucene]
cpoerschke commented on PR #13019: URL: https://github.com/apache/lucene/pull/13019#issuecomment-1908384002 closing in favour of #13014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Re-explore the logic around when Vector search should be Exact [lucene]
benwtrent commented on issue #12505: URL: https://github.com/apache/lucene/issues/12505#issuecomment-1908426746 As pointed out in other issue conversations, Cassandra keeps track of the visited ratio over the lifetime of the index and its searches: https://github.com/apache/cassandra/blob/2f2bb70ccb2657a75abc5aa691cfa28924f98d10/src/java/org/apache/cassandra/index/sai/disk/v1/segment/VectorIndexSegmentSearcher.java#L288 I am not sure if this is a thing we want to do in Lucene directly or not. Another option is to keep what we are doing (drop out of graph search if we see we are worse than brute-force). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
heemin32 commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1908615744 Added CHANGELOG entry -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
stefanvodita merged PR #12287: URL: https://github.com/apache/lucene/pull/12287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix a bug in ShapeTestUtil [lucene]
stefanvodita commented on PR #12287: URL: https://github.com/apache/lucene/pull/12287#issuecomment-1908668560 Thank you for persevering @heemin32! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SurroundQuery should pull TermStates during rewrite [lucene]
github-actions[bot] commented on PR #13008: URL: https://github.com/apache/lucene/pull/13008#issuecomment-1909133869 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Improve Javadoc for Lucene90StoredFieldsFormat [lucene]
github-actions[bot] commented on PR #12984: URL: https://github.com/apache/lucene/pull/12984#issuecomment-1909133928 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Rollback the tmp storage of BytesRefHash to -1 after sort [lucene]
gf2121 merged PR #13014: URL: https://github.com/apache/lucene/pull/13014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Fix issues with chunked TaxonomyIndexArray [lucene]
stefanvodita merged PR #13028: URL: https://github.com/apache/lucene/pull/13028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org