Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-07 Thread via GitHub
naveentatikonda commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2099712026 @benwtrent Are you aware of this recall issue with IP using SQ int8? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-07 Thread via GitHub
naveentatikonda opened a new issue, #13350: URL: https://github.com/apache/lucene/issues/13350 ### Description While running some benchmarking tests using [opensearch-benchmark](https://github.com/opensearch-project/opensearch-benchmark) on int8 scalar quantization using some of the

Re: [I] Make intra tasks in IndexingChain.flush parallel execute. [lucene]

2024-05-07 Thread via GitHub
vsop-479 commented on issue #13349: URL: https://github.com/apache/lucene/issues/13349#issuecomment-2099663308 @jpountz Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] Make intra tasks in IndexingChain.flush parallel execute. [lucene]

2024-05-07 Thread via GitHub
vsop-479 opened a new issue, #13349: URL: https://github.com/apache/lucene/issues/13349 ### Description Similar to https://github.com/apache/lucene/pull/13124, https://github.com/apache/lucene/pull/13190. Can we add a executor to `SegmentWriteState` to make tasks like `writeNorms`

Re: [PR] Performance improvements to use read lock to access LRUQueryCache [lucene]

2024-05-07 Thread via GitHub
boicehuang commented on code in PR #13306: URL: https://github.com/apache/lucene/pull/13306#discussion_r1593322390 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -911,15 +919,15 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOExce

Re: [PR] Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery [lucene]

2024-05-07 Thread via GitHub
benwtrent commented on PR #12551: URL: https://github.com/apache/lucene/pull/12551#issuecomment-2099512789 @shatejas we have no control over the index order. But, we can test accuracy and such fairly easily by indexing vectors in order of some number of centroids. But I do think this

Re: [PR] Align toString methods in geo module [lucene]

2024-05-07 Thread via GitHub
github-actions[bot] commented on PR #13302: URL: https://github.com/apache/lucene/pull/13302#issuecomment-2099508168 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery [lucene]

2024-05-07 Thread via GitHub
shatejas commented on PR #12551: URL: https://github.com/apache/lucene/pull/12551#issuecomment-2099504824 > @jimczi I like this idea at first glance, but I have one major concern. > > What about data that is indexed according to a specific order? Two tests to verify how this behaves w

Re: [PR] Introduce dynamic segment efSearch to Knn{Byte|Float}VectorQuery [lucene]

2024-05-07 Thread via GitHub
shatejas commented on PR #12551: URL: https://github.com/apache/lucene/pull/12551#issuecomment-2099503349 > However, the drawback of this approach is that in situations with multiple segments, we end up combining more top vectors than necessary. @jimczi any plans to revisit this? Wou

Re: [PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-05-07 Thread via GitHub
benwtrent commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1593083585 ## lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java: ## @@ -75,4 +76,14 @@ public static void checkField(LeafReader in, String field) {

Re: [I] Decouple within-query concurrency from the index's segment geometry [LUCENE-8675] [lucene]

2024-05-07 Thread via GitHub
harshavamsi commented on issue #9721: URL: https://github.com/apache/lucene/issues/9721#issuecomment-2099280561 > > jpountz said: > > It depends on queries. For term queries, duplicating the overhead of looking up terms in the terms dict may be ok, but for multi-term queries and point qu

Re: [PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-05-07 Thread via GitHub
msokolov commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1593061701 ## lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java: ## @@ -75,4 +76,14 @@ public static void checkField(LeafReader in, String field) {

Re: [PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-05-07 Thread via GitHub
benwtrent commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1593056612 ## lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java: ## @@ -75,4 +76,14 @@ public static void checkField(LeafReader in, String field) {

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-07 Thread via GitHub
rmuir commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2099251913 > * I'm contemplating changing the signature from `void prefetch()` to `void prefetch(long offset, long length)`. The benefit is that this would allow reading from multiple places with a s

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

2024-05-07 Thread via GitHub
zhongshanhao commented on code in PR #13343: URL: https://github.com/apache/lucene/pull/13343#discussion_r1592847407 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionScorer.java: ## @@ -115,6 +115,10 @@ private void moveToNextBlock(int target) throws IOExcept

Re: [PR] Performance improvements to use read lock to access LRUQueryCache [lucene]

2024-05-07 Thread via GitHub
jpountz commented on code in PR #13306: URL: https://github.com/apache/lucene/pull/13306#discussion_r1592821514 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -96,14 +98,15 @@ public class LRUQueryCache implements QueryCache, Accountable { // most

Re: [PR] Harden BaseDocValuesFormatTestCase (#13346) [lucene]

2024-05-07 Thread via GitHub
dnhatn merged PR #13348: URL: https://github.com/apache/lucene/pull/13348 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Harden BaseDocValuesFormatTestCase (#13346) [lucene]

2024-05-07 Thread via GitHub
dnhatn merged PR #13347: URL: https://github.com/apache/lucene/pull/13347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] Decouple within-query concurrency from the index's segment geometry [LUCENE-8675] [lucene]

2024-05-07 Thread via GitHub
msfroh commented on issue #9721: URL: https://github.com/apache/lucene/issues/9721#issuecomment-2098929299 > jpountz said: > It depends on queries. For term queries, duplicating the overhead of looking up terms in the terms dict may be ok, but for multi-term queries and point queries tha

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

2024-05-07 Thread via GitHub
jpountz commented on code in PR #13343: URL: https://github.com/apache/lucene/pull/13343#discussion_r1592820524 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionScorer.java: ## @@ -115,6 +115,10 @@ private void moveToNextBlock(int target) throws IOException {

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-05-07 Thread via GitHub
CloudMarc commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-2098924012 One might hypothesize that this issue is due to adapting HNWS to Lucene's approach to segmentation. This seems undercut by these observations: https://github.com/apache/lu

Re: [PR] Harden BaseDocValuesFormatTestCase [lucene]

2024-05-07 Thread via GitHub
dnhatn merged PR #13346: URL: https://github.com/apache/lucene/pull/13346 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Harden BaseDocValuesFormatTestCase [lucene]

2024-05-07 Thread via GitHub
dnhatn commented on code in PR #13346: URL: https://github.com/apache/lucene/pull/13346#discussion_r1592732847 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseDocValuesFormatTestCase.java: ## @@ -1414,6 +1414,45 @@ protected void assertDVIterate(Directory dir

Re: [PR] Harden BaseDocValuesFormatTestCase [lucene]

2024-05-07 Thread via GitHub
dnhatn commented on PR #13346: URL: https://github.com/apache/lucene/pull/13346#issuecomment-2098808242 Thanks @jpountz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

2024-05-07 Thread via GitHub
zhongshanhao commented on code in PR #13343: URL: https://github.com/apache/lucene/pull/13343#discussion_r1592729159 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -56,9 +56,29 @@ final class BlockMaxConjunctionBulkScorer extends BulkS

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

2024-05-07 Thread via GitHub
zhongshanhao commented on code in PR #13343: URL: https://github.com/apache/lucene/pull/13343#discussion_r1592726253 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionScorer.java: ## @@ -167,8 +171,6 @@ private int doNext(int doc) throws IOException {

Re: [PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-05-07 Thread via GitHub
ChrisHegarty commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1592707341 ## lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java: ## @@ -75,4 +76,14 @@ public static void checkField(LeafReader in, String field) {

Re: [PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-05-07 Thread via GitHub
ChrisHegarty commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1592707341 ## lucene/core/src/java/org/apache/lucene/index/ByteVectorValues.java: ## @@ -75,4 +76,14 @@ public static void checkField(LeafReader in, String field) {

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

2024-05-07 Thread via GitHub
jpountz commented on code in PR #13343: URL: https://github.com/apache/lucene/pull/13343#discussion_r1592678225 ## lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java: ## @@ -68,18 +88,10 @@ public int score(LeafCollector collector, Bits acceptDocs,

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-07 Thread via GitHub
uschindler commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2098635698 > Some questions about the API, curious to get your thoughts: > > * Should we remove `ReadAdvice#WILL_NEED` and instead introduce a new API such as `NativeAccess#madviseWillNeed

Re: [PR] Harden BaseDocValuesFormatTestCase [lucene]

2024-05-07 Thread via GitHub
jpountz commented on code in PR #13346: URL: https://github.com/apache/lucene/pull/13346#discussion_r1592546383 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseDocValuesFormatTestCase.java: ## @@ -1414,6 +1414,45 @@ protected void assertDVIterate(Directory di

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

2024-05-07 Thread via GitHub
zhongshanhao commented on PR #13343: URL: https://github.com/apache/lucene/pull/13343#issuecomment-2098461258 @jpountz Yeah, I was overthinking it. Implementing it your way makes it much clearer😊. I have made revisions and committed it. -- This is an automated message from the Apache Gi

Re: [I] How to run tests with the Panama Vector implementation [lucene]

2024-05-07 Thread via GitHub
ChrisHegarty commented on issue #13344: URL: https://github.com/apache/lucene/issues/13344#issuecomment-2098198314 > By the way the empty string is allowed because of this: Yes, exactly. I'm exploiting this, which is a fine way to say "default" :-) > My quickest idea would be to

Re: [I] How to run tests with the Panama Vector implementation [lucene]

2024-05-07 Thread via GitHub
uschindler commented on issue #13344: URL: https://github.com/apache/lucene/issues/13344#issuecomment-2098179725 We can change the defaults, sure. I am just afraid that Robert will be annoyed. He wants to randomly test all bit sizes. By the way the empty string is allowed because of t

Re: [I] How to run tests with the Panama Vector implementation [lucene]

2024-05-07 Thread via GitHub
ChrisHegarty commented on issue #13344: URL: https://github.com/apache/lucene/issues/13344#issuecomment-2098157117 I wonder if we can just avoid some of this complexity. For example, locally I can run all the tests with the Panama Vector implementation, by doing this: ``` $ expo

Re: [PR] Add IndexInput#prefetch. [lucene]

2024-05-07 Thread via GitHub
jpountz commented on PR #13337: URL: https://github.com/apache/lucene/pull/13337#issuecomment-2097602534 I reverted changed so `BufferedIndexInput`, agreed to focus on `MMapDirectory` for now. -- This is an automated message from the Apache Git Service. To respond to the message, please l