Re: [I] TestSsDvMultiRangeQuery.testDuelWithStandardDisjunction fails [lucene]

2025-02-21 Thread via GitHub
mkhludnev commented on issue #14260: URL: https://github.com/apache/lucene/issues/14260#issuecomment-2673974937 main passed as well https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/1579/changes -- This is an automated message from the Apache Git Service. To respond to

[PR] fix [lucene]

2025-02-21 Thread via GitHub
javanna opened a new pull request, #14270: URL: https://github.com/apache/lucene/pull/14270 We currently have a gap in testing the completion postings format. While it allows to load FSTs off heap, and the format has a constructor that takes the fst load mode, at read time SPI goes through

Re: [PR] fix [lucene]

2025-02-21 Thread via GitHub
javanna commented on code in PR #14270: URL: https://github.com/apache/lucene/pull/14270#discussion_r1965557931 ## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/Completion101PostingsFormat.java: ## @@ -25,17 +25,10 @@ * @lucene.experimental */ public cla

Re: [PR] Implement #intoBitset for DocIdSetIterator#all and DocIdSetIterator#range [lucene]

2025-02-21 Thread via GitHub
gf2121 merged PR #14269: URL: https://github.com/apache/lucene/pull/14269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] fix [lucene]

2025-02-21 Thread via GitHub
javanna commented on code in PR #14270: URL: https://github.com/apache/lucene/pull/14270#discussion_r1965557931 ## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/Completion101PostingsFormat.java: ## @@ -25,17 +25,10 @@ * @lucene.experimental */ public cla

Re: [PR] fix [lucene]

2025-02-21 Thread via GitHub
javanna commented on code in PR #14270: URL: https://github.com/apache/lucene/pull/14270#discussion_r1965567430 ## lucene/suggest/src/test/org/apache/lucene/search/suggest/document/TestSuggestField.java: ## @@ -949,10 +952,7 @@ static IndexWriterConfig iwcWithSuggestField(Analyz

Re: [PR] fix [lucene]

2025-02-21 Thread via GitHub
javanna commented on code in PR #14270: URL: https://github.com/apache/lucene/pull/14270#discussion_r1965564752 ## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionPostingsFormat.java: ## @@ -104,6 +104,9 @@ public abstract class CompletionPostingsForm

Re: [PR] fix [lucene]

2025-02-21 Thread via GitHub
javanna commented on code in PR #14270: URL: https://github.com/apache/lucene/pull/14270#discussion_r1965565789 ## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionPostingsFormat.java: ## @@ -122,18 +125,13 @@ public enum FSTLoadMode { private fin

[PR] Use DocIdSetIterator#range for continuous-id BKD leaves [lucene]

2025-02-21 Thread via GitHub
gf2121 opened a new pull request, #14272: URL: https://github.com/apache/lucene/pull/14272 Use`DocIdSetIterator#range` instead of `DocBaseBitsetIterator` as `DocIdSetIterator#range` implemented `#intoBitset`. -- This is an automated message from the Apache Git Service. To respond to the m

[PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-02-21 Thread via GitHub
jpountz opened a new pull request, #14273: URL: https://github.com/apache/lucene/pull/14273 This attempts to generalize the `IndexSearcher#count` optimization from PR #12415 to histogram facets by introducing specialization for counting the number of matching docs in a range of doc IDs.

Re: [I] Refactor QueryCache to improve concurrency and performance [lucene]

2025-02-21 Thread via GitHub
sgup432 commented on issue #14222: URL: https://github.com/apache/lucene/issues/14222#issuecomment-2675501604 I re-ran test with 1mb cache, and assuming 50 lucene segments Numbers are even better! ## Performance Comparison: v1 vs v2 | Benchmark

Re: [PR] knn search - add tests to perform exact search when filtering does not return enough results [lucene]

2025-02-21 Thread via GitHub
benwtrent commented on code in PR #14274: URL: https://github.com/apache/lucene/pull/14274#discussion_r1966141126 ## lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java: ## @@ -31,23 +31,7 @@ import org.apache.lucene.document.IntPoint; import org.apac

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-21 Thread via GitHub
benwtrent commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2675053049 > Sorry I didn't understand this. @msokolov I mean that during search, we would by default have more neighbors to look at. Ensuring diversity eagerly means that its possible

[PR] knn search - perform exact search when filtering does not return enough results [lucene]

2025-02-21 Thread via GitHub
carlosdelest opened a new pull request, #14274: URL: https://github.com/apache/lucene/pull/14274 When doing approximate knn search, it's possible that the approximate search returns less than k results. In case there is a filter, we know the filter cost so we can check if there are actually

Re: [PR] Stop cloning index input when loading NRTSuggester [lucene]

2025-02-21 Thread via GitHub
javanna commented on code in PR #14271: URL: https://github.com/apache/lucene/pull/14271#discussion_r1965909207 ## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionsTermsReader.java: ## @@ -72,10 +72,8 @@ public final class CompletionsTermsReader imple

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-21 Thread via GitHub
Vikasht34 commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2675065270 @benwtrent As far as I understand from your idea is to use Delaunay Triangulation and skip connectComponents() ? -- This is an automated message from the Apache Git Service. To

Re: [PR] Stop closing index input when loading NRTSuggester [lucene]

2025-02-21 Thread via GitHub
javanna merged PR #14271: URL: https://github.com/apache/lucene/pull/14271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Stop closing index input when loading NRTSuggester [lucene]

2025-02-21 Thread via GitHub
javanna commented on PR #14271: URL: https://github.com/apache/lucene/pull/14271#issuecomment-2675212515 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-21 Thread via GitHub
msokolov commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2675147250 > I mean that during search, we would by default have more neighbors to look at. I see, yes, that's true. -- This is an automated message from the Apache Git Service. To

[PR] Support load per-iteration replacement of NamedSPI [lucene]

2025-02-21 Thread via GitHub
ChrisHegarty opened a new pull request, #14275: URL: https://github.com/apache/lucene/pull/14275 This commit adds support load per-iteration replacement of NamedSPI. The primary motivation for this change is to support deterministic SPI loading when deploying Lucene as a module. Whe

[PR] Implement #intoBitset for DocIdSetIterator#all and DocIdSetIterator#range [lucene]

2025-02-21 Thread via GitHub
gf2121 opened a new pull request, #14269: URL: https://github.com/apache/lucene/pull/14269 Implement `#intoBitset` for `DocIdSetIterator#all` and `DocIdSetIterator#range`. For reference, `DocIdSetIterator#all` used in queries hit all docs. `DocIdSetIterator#range` used in docvalue qu

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-21 Thread via GitHub
msokolov commented on code in PR #14226: URL: https://github.com/apache/lucene/pull/14226#discussion_r1965419004 ## lucene/core/src/java/org/apache/lucene/search/OptimisticKnnVectorQuery.java: ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [I] Flaky `TestKnnByteVectorQueryMMap.testRandomWithFilter` test failures [lucene]

2025-02-21 Thread via GitHub
msokolov commented on issue #14266: URL: https://github.com/apache/lucene/issues/14266#issuecomment-2674480203 These random test failures in an approximate world are hard. There are no guarantees! In similar cases in the past I have handcrafted data to exercise the case in question: ie get

Re: [PR] Reduce knn recall test flakiness [lucene]

2025-02-21 Thread via GitHub
benwtrent merged PR #14265: URL: https://github.com/apache/lucene/pull/14265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-21 Thread via GitHub
msokolov commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2674487026 In the original diversity impl we had allowed the neighbors array to fill without regard to any diversity criterion, and only started imposing it once the array was full. This mea

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-21 Thread via GitHub
benwtrent commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2674533124 > This means we have to sort at that point IIRC, but it might be a better, more robust choice? I am not sure if its "better" I would assume it makes search slower on well d

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-21 Thread via GitHub
msokolov commented on code in PR #14226: URL: https://github.com/apache/lucene/pull/14226#discussion_r1965415548 ## lucene/core/src/java/org/apache/lucene/search/OptimisticKnnVectorQuery.java: ## @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]

2025-02-21 Thread via GitHub
jpountz commented on PR #13470: URL: https://github.com/apache/lucene/pull/13470#issuecomment-2674572334 I plan on merging this PR soon if there are no objections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] HNSW connect components can take an inordinate amount of time [lucene]

2025-02-21 Thread via GitHub
msokolov commented on issue #14214: URL: https://github.com/apache/lucene/issues/14214#issuecomment-2674568528 > I am not sure if its "better" I would assume it makes search slower on well distributed and distinguished vectors :/ Sorry I didn't understand this. In the normal case, we'

Re: [I] Flaky `TestKnnByteVectorQueryMMap.testRandomWithFilter` test failures [lucene]

2025-02-21 Thread via GitHub
benwtrent commented on issue #14266: URL: https://github.com/apache/lucene/issues/14266#issuecomment-2674536319 > These random test failures in an approximate world are hard. There are no guarantees! No doubt! I was super surprised by this failure 😂 . The probabilities are ast

Re: [PR] Stop cloning index input when loading NRTSuggester [lucene]

2025-02-21 Thread via GitHub
jpountz commented on code in PR #14271: URL: https://github.com/apache/lucene/pull/14271#discussion_r1965729900 ## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionsTermsReader.java: ## @@ -72,10 +72,8 @@ public final class CompletionsTermsReader imple

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-02-21 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2674888106 @epotyom You may be interested in taking a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-21 Thread via GitHub
jpountz merged PR #14204: URL: https://github.com/apache/lucene/pull/14204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Implement #intoBitset for DocIdSetIterator#all and DocIdSetIterator#range [lucene]

2025-02-21 Thread via GitHub
gf2121 commented on code in PR #14269: URL: https://github.com/apache/lucene/pull/14269#discussion_r1965325680 ## lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java: ## @@ -87,6 +87,18 @@ public int advance(int target) throws IOException { public long cos

Re: [PR] Implement #intoBitset for DocIdSetIterator#all and DocIdSetIterator#range [lucene]

2025-02-21 Thread via GitHub
gf2121 commented on code in PR #14269: URL: https://github.com/apache/lucene/pull/14269#discussion_r1965326705 ## lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java: ## @@ -224,7 +248,7 @@ protected final int slowAdvance(int target) throws IOException { *

Re: [PR] Implement #intoBitset for DocIdSetIterator#all and DocIdSetIterator#range [lucene]

2025-02-21 Thread via GitHub
jpountz commented on code in PR #14269: URL: https://github.com/apache/lucene/pull/14269#discussion_r1965299369 ## lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java: ## @@ -224,7 +248,7 @@ protected final int slowAdvance(int target) throws IOException { *