Re: [PR] Forbidden Thread.sleep API [lucene]

2024-01-24 Thread via GitHub


shubhamvishu commented on PR #13001:
URL: https://github.com/apache/lucene/pull/13001#issuecomment-1907646476

   @uschindler @mikemccand I think we could merge this now unless I'm missing 
something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-24 Thread via GitHub


stefanvodita commented on PR #12287:
URL: https://github.com/apache/lucene/pull/12287#issuecomment-1907818280

   Maybe we can merge this fix as-is and continue the conversation about where 
test polygon generation should produce valid polygons in #12596. It's not clear 
to me now if there is anything to fix in #12596, but I think we agree that the 
fix in this PR should go ahead.
   
   @heemin32 - do you want to write a 
[CHANGES](https://github.com/apache/lucene/blob/main/lucene/CHANGES.txt) entry 
for Lucene 9.10?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Improve AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight ScorerSupplier cost estimation [lucene]

2024-01-24 Thread via GitHub


rquesada-tibco opened a new issue, #13029:
URL: https://github.com/apache/lucene/issues/13029

   ### Description
   
   We recently discovered a performance degradation on our project when going 
from Lucene 9.4 to 9.9. The cause seems to be a side effect of 
https://github.com/apache/lucene/commit/c6667e709f610669b57a6a1a6ab1cc80f4f0ebaf
 and 
https://github.com/apache/lucene/commit/3809106602a9675f4fd217b1090af4505d4ec2a7
   
   The situation is as follows: we have a `WildcardQuery` and a 
`TermInSetQuery` which are and-combined (within a `BooleanQuery`). This 
structure gets executed repeatedly, kind of like a nested loop where the 
`WildcardQuery` remains the same, but the `TermInSetQuery` keeps changes its 
terms. In the old version, this was fast because the `WildcardQuery` was cached 
within the `LRUQueryCache`. However in the new version this is no longer the 
case, so the execution time of our scenario has increased.
   
   Why our `WildcardQuery` is not cached any more? It boils down to [this 
line](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java#L771)
 in `LRUQueryCache`, where the cache operation won't happen if the cost 
estimation is too high:
   ```
   final ScorerSupplier supplier = in.scorerSupplier(context);
   ...
   final long cost = supplier.cost();
   ...
   // skip cache operation which would slow query down too much
   if (cost / skipCacheFactor > leadCost) {
   ...
   ```
   
   Before the upgrade to 9.9, that cost was provided by a `ConstantScoreWeight` 
returned by the old `MultiTermQueryConstantScoreWrapper` (which was returned by 
the default `RewriteMethod`), which in the end was just based on the "default" 
`Weight#scoreSupplier` 
[implementation](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/Weight.java#L147):
 basically the cost was the `scorer.iterator().cost();` and in **our case the 
`WildcardQuery` returns just one document, so cost 1**.
   
   After the upgrade, the default `RewriteMethod` has changed and now this cost 
is provided by 
`AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight#scorerSupplier` 
[here](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L263),
 and for that purpose a private [estimateCost 
method](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L295)
 was introduced, which bases the estimation on the 
[MultiTermQuery#getTermsCount](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/MultiTermQuery.java#L316)
 value. The problem is that, for our `WildcardQuery` (in fact for any sub-class 
of `AutomatonQuery`), this value is unknown, i.e. `-1`, so the `estimateCost` 
method just [return
 
s](https://github.com/apache/lucene/blob/f16007c3eca7cbf89bf6c2f88907707cf30c0058/lucene/core/src/java/org/apache/lucene/search/AbstractMultiTermQueryConstantScoreWrapper.java#L309)
  `terms.getSumDocFreq()`, which is clearly an overestimation in our case, so 
it prevents the caching, and leads to a performance degradation.
   
   I understand that I can fix this situation by writing my customized 
`RewriteMethod`.
   The question is: could we improve 
`AbstractMultiTermQueryConstantScoreWrapper#RewritingWeight#scorerSupplier#cost`
  so that, if the MTQ cannot provide a term count (`getTermsCount() == -1`) 
then we return `scorer.iterator().cost()` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add support for similarity-based vector searches [lucene]

2024-01-24 Thread via GitHub


kaivalnp commented on PR #12679:
URL: https://github.com/apache/lucene/pull/12679#issuecomment-1907958716

   This feature will ship with Lucene 9.10
   
   I'm not sure when that will be released, though [I 
see](https://lucene.apache.org/core/corenews.html) \~2-4 months between 
previous minor versions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix issues with chunked TaxonomyIndexArray [lucene]

2024-01-24 Thread via GitHub


stefanvodita commented on code in PR #13028:
URL: https://github.com/apache/lucene/pull/13028#discussion_r1464835176


##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java:
##
@@ -95,7 +97,8 @@ public TaxonomyIndexArrays(IndexReader reader, 
TaxonomyIndexArrays copyFrom) thr
 // NRT reader was obtained, even though nothing was changed. this is not 
very likely
 // to happen.
 int[][] parentArray = allocateChunkedArray(reader.maxDoc(), 
copyFrom.parents.values.length - 1);
-if (parentArray.length > 0) {
+assert parentArray.length > 0;
+if (parentArray[parentArray.length - 1].length > 0) {

Review Comment:
   In this constructor we can't use `parentArray[0].length > 0` because 
`parentArray[0]` could be `null` if we had already filled up the first chunk, 
but I actually think we don't need this if statement at all - what are we 
guarding against?



##
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java:
##
@@ -80,7 +81,8 @@ public int length() {
 
   public TaxonomyIndexArrays(IndexReader reader) throws IOException {
 int[][] parentArray = allocateChunkedArray(reader.maxDoc(), 0);
-if (parentArray.length > 0) {
+assert parentArray.length > 0;
+if (parentArray[parentArray.length - 1].length > 0) {

Review Comment:
   You're right that this condition is not correct. I'll revert back to 
`parentArray[0].length > 0`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Fix test failure for TestTopFieldCollector.testTotalHits [lucene]

2024-01-24 Thread via GitHub


easyice opened a new pull request, #13030:
URL: https://github.com/apache/lucene/pull/13030

   Gradle command to reproduce:
   
   ```
   ./gradlew :lucene:core:test --tests 
"org.apache.lucene.search.TestTopFieldCollector.testTotalHits" 
-Ptests.heapsize=2g -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 
-XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.seed=334CDB6B7D196089 
-Ptests.nightly=true -Ptests.gui=false -Ptests.file.encoding=ISO-8859-1 
-Ptests.vectorsize=256
   ```
   
   Error message:
   
   ```
  > java.lang.AssertionError: expected:<2> but was:<3>
  > at 
__randomizedtesting.SeedInfo.seed([334CDB6B7D196089:193C89DD7C8E120E]:0)
  > at org.junit.Assert.fail(Assert.java:89)
  > at org.junit.Assert.failNotEquals(Assert.java:835)
  > at org.junit.Assert.assertEquals(Assert.java:647)
  > at org.junit.Assert.assertEquals(Assert.java:633)
  > at 
org.apache.lucene.search.TestTopFieldCollector.testTotalHits(TestTopFieldCollector.java:195)
   
   
   ```   
   The `newIndexWriterConfig` will `setMaxBufferedDocs` using a random value of 
2-15 or 16-1000, which maybe cause the `AssertionError` in this test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-24 Thread via GitHub


mikemccand commented on PR #12287:
URL: https://github.com/apache/lucene/pull/12287#issuecomment-1908191816

   > Maybe we can merge this fix as-is and continue the conversation about 
where test polygon generation should produce valid polygons in #12596. I
   
   +1.  Progress not perfection!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Propagate topLevelScoringClause from QueryProfiler [lucene]

2024-01-24 Thread via GitHub


mrkm4ntr opened a new pull request, #13031:
URL: https://github.com/apache/lucene/pull/13031

   ### Description
   The topLevelScoringClause is not propagated with QueryProfiler. This causes 
some optimizations are skipped during profiling.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Rollback the tmp storage of BytesRefHash to -1 after sort [lucene]

2024-01-24 Thread via GitHub


ChrisHegarty commented on PR #13014:
URL: https://github.com/apache/lucene/pull/13014#issuecomment-1908264542

   @gf2121  I've added the 9.9.2 milestone to this PR.  Do you agree? If so, is 
it possible to merge and backport to the `branch_9x` and `branch_9_9` branches. 
 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] in BytesRefHash constructor avoid duplicate BytesStartArray.bytesUsed() call [lucene]

2024-01-24 Thread via GitHub


cpoerschke opened a new pull request, #13032:
URL: https://github.com/apache/lucene/pull/13032

   Noticed whilst code reading. No issue or CHANGES.txt entry needed I think. 
For `main` and `branch_9x` branches only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] join: avoid repeat BytesRefHash.sort() in TermsQuery after TermsIncludingScoreQuery [lucene]

2024-01-24 Thread via GitHub


cpoerschke closed pull request #13019: join: avoid repeat BytesRefHash.sort() 
in TermsQuery after TermsIncludingScoreQuery
URL: https://github.com/apache/lucene/pull/13019


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] join: avoid repeat BytesRefHash.sort() in TermsQuery after TermsIncludingScoreQuery [lucene]

2024-01-24 Thread via GitHub


cpoerschke commented on PR #13019:
URL: https://github.com/apache/lucene/pull/13019#issuecomment-1908384002

   closing in favour of #13014 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Re-explore the logic around when Vector search should be Exact [lucene]

2024-01-24 Thread via GitHub


benwtrent commented on issue #12505:
URL: https://github.com/apache/lucene/issues/12505#issuecomment-1908426746

   As pointed out in other issue conversations, Cassandra keeps track of the 
visited ratio over the lifetime of the index and its searches: 
https://github.com/apache/cassandra/blob/2f2bb70ccb2657a75abc5aa691cfa28924f98d10/src/java/org/apache/cassandra/index/sai/disk/v1/segment/VectorIndexSegmentSearcher.java#L288
   
   I am not sure if this is a thing we want to do in Lucene directly or not.
   
   Another option is to keep what we are doing (drop out of graph search if we 
see we are worse than brute-force).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-24 Thread via GitHub


heemin32 commented on PR #12287:
URL: https://github.com/apache/lucene/pull/12287#issuecomment-1908615744

   Added CHANGELOG entry


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-24 Thread via GitHub


stefanvodita merged PR #12287:
URL: https://github.com/apache/lucene/pull/12287


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix a bug in ShapeTestUtil [lucene]

2024-01-24 Thread via GitHub


stefanvodita commented on PR #12287:
URL: https://github.com/apache/lucene/pull/12287#issuecomment-1908668560

   Thank you for persevering @heemin32!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] SurroundQuery should pull TermStates during rewrite [lucene]

2024-01-24 Thread via GitHub


github-actions[bot] commented on PR #13008:
URL: https://github.com/apache/lucene/pull/13008#issuecomment-1909133869

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Improve Javadoc for Lucene90StoredFieldsFormat [lucene]

2024-01-24 Thread via GitHub


github-actions[bot] commented on PR #12984:
URL: https://github.com/apache/lucene/pull/12984#issuecomment-1909133928

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Rollback the tmp storage of BytesRefHash to -1 after sort [lucene]

2024-01-24 Thread via GitHub


gf2121 merged PR #13014:
URL: https://github.com/apache/lucene/pull/13014


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix issues with chunked TaxonomyIndexArray [lucene]

2024-01-24 Thread via GitHub


stefanvodita merged PR #13028:
URL: https://github.com/apache/lucene/pull/13028


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org