Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
shatejas commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1842587936 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -123,4 +123,11 @@ public abstract void search( public KnnVectorsReader getMergeInstan

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
ChrisHegarty commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1842566686 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -123,4 +123,11 @@ public abstract void search( public KnnVectorsReader getMergeIn

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
ChrisHegarty commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1842566686 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -123,4 +123,11 @@ public abstract void search( public KnnVectorsReader getMergeIn

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
shatejas commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1842587936 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -123,4 +123,11 @@ public abstract void search( public KnnVectorsReader getMergeInstan

Re: [PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]

2024-11-14 Thread via GitHub
msfroh commented on code in PR #13987: URL: https://github.com/apache/lucene/pull/13987#discussion_r1842729383 ## lucene/core/src/test/org/apache/lucene/search/TestTopFieldCollector.java: ## @@ -359,7 +359,7 @@ public void testTotalHitsWithScore() throws Exception { leafC

Re: [PR] [DRAFT] Change vector input from IndexInput to RandomAccessInput [lucene]

2024-11-14 Thread via GitHub
dungba88 commented on code in PR #13981: URL: https://github.com/apache/lucene/pull/13981#discussion_r1842998968 ## lucene/core/src/java/org/apache/lucene/store/RandomAccessInput.java: ## @@ -77,4 +85,6 @@ default void readBytes(long pos, byte[] bytes, int offset, int length) t

Re: [PR] [KNN] Add comment and remove duplicate code [lucene]

2024-11-14 Thread via GitHub
dungba88 commented on code in PR #13594: URL: https://github.com/apache/lucene/pull/13594#discussion_r1843000812 ## lucene/core/src/java/org/apache/lucene/search/AnnQueryUtils.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-11-14 Thread via GitHub
dungba88 commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2477616738 Hi @benwtrent Thanks for the reply. It seems I wasn't clear enough in my previous question. Regarding `exactSearch`, it was based on this jmazanec15@ comment. `getFlo

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-11-14 Thread via GitHub
dungba88 commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-241816 > Apply the brute-force reranking (by passing the "exactSearch" path as that uses quantized values). Maybe I misunderstood something, but I thought the idea is to re-rank wit

Re: [PR] [DRAFT] Change vector input from IndexInput to RandomAccessInput [lucene]

2024-11-14 Thread via GitHub
benwtrent commented on code in PR #13981: URL: https://github.com/apache/lucene/pull/13981#discussion_r1842957299 ## lucene/core/src/java/org/apache/lucene/store/RandomAccessInput.java: ## @@ -77,4 +85,6 @@ default void readBytes(long pos, byte[] bytes, int offset, int length)

Re: [PR] Misc cleanups to TopScoreDocCollector [lucene]

2024-11-14 Thread via GitHub
github-actions[bot] commented on PR #13935: URL: https://github.com/apache/lucene/pull/13935#issuecomment-2477684013 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] lucene-monitor: make abstract DocumentBatch public [lucene]

2024-11-14 Thread via GitHub
kotman12 commented on PR #13993: URL: https://github.com/apache/lucene/pull/13993#issuecomment-2477481565 Following up briefly, I just don't see a lot of use for a QueryIndex or a Monitor in a more "advanced" set-up such a Solr. The QueryIndex api is quite large and most of the operations a

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-11-14 Thread via GitHub
dungba88 commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2477832716 I also saw where my misunderstanding comes from now: `getFloatVectorValues()` returns QuantizedVectorValues, which has 2 methods: `vectorValue(int ord)` returns the raw vector whil

Re: [PR] lucene-monitor: make abstract DocumentBatch public [lucene]

2024-11-14 Thread via GitHub
kotman12 commented on PR #13993: URL: https://github.com/apache/lucene/pull/13993#issuecomment-2476125066 Author here, so the current monitor API makes it really hard to integrate into anything more "custom" like solr. This is because it tightly seals relevant implementation details like ca

Re: [PR] lucene-monitor: make abstract DocumentBatch public [lucene]

2024-11-14 Thread via GitHub
romseygeek commented on PR #13993: URL: https://github.com/apache/lucene/pull/13993#issuecomment-2476301351 I'm definitely up for making this more extensible. We already have two separate QueryIndex implementations, so maybe that's the best place to start? Cacheing of parsed queries is pr

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
ChrisHegarty commented on PR #13985: URL: https://github.com/apache/lucene/pull/13985#issuecomment-2476285629 Thanks @shatejas For clarity, the bottleneck that is being fixed here is with the reading of all vector data from the to-be-merged segments, when copying that data to the ne

Re: [PR] lucene-monitor: make abstract DocumentBatch public [lucene]

2024-11-14 Thread via GitHub
kotman12 commented on PR #13993: URL: https://github.com/apache/lucene/pull/13993#issuecomment-2476322270 > Cacheing of parsed queries is pretty tightly coupled to how the query index is implemented so separating them out might be trickier The current "solr monitor" PR avoids this pro

[PR] lucene-monitor: make static DocumentBatch.of package scope [lucene]

2024-11-14 Thread via GitHub
cpoerschke opened a new pull request, #13995: URL: https://github.com/apache/lucene/pull/13995 Alternative to #13993 i.e. visibility of `of` to match visibility of the class itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] lucene-monitor: make abstract DocumentBatch public [lucene]

2024-11-14 Thread via GitHub
cpoerschke commented on PR #13993: URL: https://github.com/apache/lucene/pull/13993#issuecomment-2476326328 Opened #13995 as an alternative i.e. `of` method visibility to match the class. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Speed up top-k retrieval on filtered conjunctions. [lucene]

2024-11-14 Thread via GitHub
benwtrent commented on code in PR #13994: URL: https://github.com/apache/lucene/pull/13994#discussion_r1842353256 ## lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java: ## @@ -558,6 +557,41 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException {

Re: [PR] lucene-monitor: make abstract DocumentBatch public [lucene]

2024-11-14 Thread via GitHub
romseygeek commented on PR #13993: URL: https://github.com/apache/lucene/pull/13993#issuecomment-2475955653 DocumentBatch is really an implementation detail of the Monitor, so I'm not sure why client code would need to refer to it? The linked Solr PR is quite big so it's difficult to see w

Re: [PR] Allow easier verification of the Panama Vectorization provider with newer Java versions [lucene]

2024-11-14 Thread via GitHub
ChrisHegarty merged PR #13986: URL: https://github.com/apache/lucene/pull/13986 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

[PR] Speed up top-k retrieval on filtered conjunctions. [lucene]

2024-11-14 Thread via GitHub
jpountz opened a new pull request, #13994: URL: https://github.com/apache/lucene/pull/13994 A while back we added an optimized bulk scorer that implements block-max AND, this yielded a good speedup on nightly benchmarks, see annotation `FP` at https://benchmarks.mikemccandless.com/AndHighHi

Re: [PR] Speed up top-k retrieval on filtered conjunctions. [lucene]

2024-11-14 Thread via GitHub
jpountz commented on PR #13994: URL: https://github.com/apache/lucene/pull/13994#issuecomment-2475814197 I ran luceneutil on wikibigall and the new filtered tasks from wikinightly.tasks (https://github.com/mikemccand/luceneutil/blob/main/tasks/wikinightly.tasks#L355-L390): ```

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-11-14 Thread via GitHub
dungba88 commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2475738069 Hi all, I was looking at this idea for some experimentation ideas (not mean to be intrusive to ongoing effort). If the full sized vectors are exposed through `getFloatVectorV

Re: [PR] Speed up retrieval of top-k filtered disjunctions a bit. [lucene]

2024-11-14 Thread via GitHub
jpountz commented on PR #13996: URL: https://github.com/apache/lucene/pull/13996#issuecomment-2476733979 I ran luceneutil on wikibigall and the new filtered tasks from wikinightly.tasks (https://github.com/mikemccand/luceneutil/blob/main/tasks/wikinightly.tasks#L355-L390): ```

Re: [PR] Speed up top-k retrieval on filtered conjunctions. [lucene]

2024-11-14 Thread via GitHub
benwtrent commented on PR #13994: URL: https://github.com/apache/lucene/pull/13994#issuecomment-2476765612 @jpountz don't forget CHANGES ;) but this lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Speed up retrieval of top-k filtered disjunctions a bit. [lucene]

2024-11-14 Thread via GitHub
jpountz opened a new pull request, #13996: URL: https://github.com/apache/lucene/pull/13996 This moves work from `advance(int target)` to `TwoPhaseIterator#matches()` so that we do less work on hits that do not match the filter. -- This is an automated message from the Apache Git Serv

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
ChrisHegarty commented on PR #13985: URL: https://github.com/apache/lucene/pull/13985#issuecomment-2476880937 Generally, I think that the direction in this PR is good. I wanna help get it moved forward. I'll do some local testing and perf runs to verify the impact. I can also commit some te

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
ChrisHegarty commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1842566686 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -123,4 +123,11 @@ public abstract void search( public KnnVectorsReader getMergeIn

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
shatejas commented on PR #13985: URL: https://github.com/apache/lucene/pull/13985#issuecomment-2476919288 > Generally, I think that the direction in this PR is good. I wanna help get it moved forward. I'll do some local testing and perf runs to verify the impact. I can also commit some test

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
ChrisHegarty commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1842566686 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -123,4 +123,11 @@ public abstract void search( public KnnVectorsReader getMergeIn

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-14 Thread via GitHub
ChrisHegarty commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1842167895 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -123,4 +123,11 @@ public abstract void search( public KnnVectorsReader getMergeIn

Re: [PR] [KNN] Add comment and remove duplicate code [lucene]

2024-11-14 Thread via GitHub
benwtrent commented on code in PR #13594: URL: https://github.com/apache/lucene/pull/13594#discussion_r1842215165 ## lucene/core/src/java/org/apache/lucene/search/AnnQueryUtils.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-11-14 Thread via GitHub
benwtrent commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2476505715 > then it seems we can just call exactSearch over the results of approximateSearch, Exact search will hopefully just be using the quantized values as well. > However