[PR] Update document by docID [lucene]

2024-06-11 Thread via GitHub
vsop-479 opened a new pull request, #13481: URL: https://github.com/apache/lucene/pull/13481 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Use SPI instead of Enum for VectorSimilarityFunctions [lucene]

2024-06-11 Thread via GitHub
github-actions[bot] commented on PR #13401: URL: https://github.com/apache/lucene/pull/13401#issuecomment-2161833420 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-11 Thread via GitHub
gsmiller commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2161735059 @benwtrent ++ to understanding the performance regression before pushing. I haven't made any more progress there personally. Agreed with waiting to merge until we understand what's goin

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-11 Thread via GitHub
gsmiller commented on code in PR #13463: URL: https://github.com/apache/lucene/pull/13463#discussion_r1635558572 ## lucene/core/src/java/org/apache/lucene/util/hnsw/BlockingFloatHeap.java: ## @@ -72,12 +72,13 @@ public float offer(float value) { * Values must be sorted in as

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-11 Thread via GitHub
gsmiller commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2161723327 @mayya-sharipova: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] WIP - Add minimum number of segments to TieredMergePolicy [lucene]

2024-06-11 Thread via GitHub
jpountz commented on code in PR #13430: URL: https://github.com/apache/lucene/pull/13430#discussion_r1635481301 ## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ## @@ -522,21 +550,28 @@ private MergeSpecification doFindMerges( final List candidate

[I] Scalar quantization extreme edge case of uniform vector values [lucene]

2024-06-11 Thread via GitHub
benwtrent opened a new issue, #13480: URL: https://github.com/apache/lucene/issues/13480 ### Description When quantizing vectors that have a uniform value, the quantiles can get really weird. Meaning both min and max quantiles are actually equivalent. Additionally, the scoring can be

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-11 Thread via GitHub
benwtrent commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2161527974 @mayya-sharipova are you concerned at all that the performance gains we thought we had seem to disappear with this bug fix? Could you retest to verify? What I am seeing locally i

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-11 Thread via GitHub
mayya-sharipova commented on code in PR #13463: URL: https://github.com/apache/lucene/pull/13463#discussion_r1635365499 ## lucene/core/src/java/org/apache/lucene/util/hnsw/BlockingFloatHeap.java: ## @@ -72,12 +72,13 @@ public float offer(float value) { * Values must be sorte

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-06-11 Thread via GitHub
msokolov commented on code in PR #13469: URL: https://github.com/apache/lucene/pull/13469#discussion_r1635387325 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java: ## @@ -217,6 +220,18 @@ public ByteVectorValues getByteVectorValues(String f

Re: [PR] Mark COSINE VectorSimilarity function as deprecated [lucene]

2024-06-11 Thread via GitHub
benwtrent merged PR #13473: URL: https://github.com/apache/lucene/pull/13473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-06-11 Thread via GitHub
msokolov commented on code in PR #13469: URL: https://github.com/apache/lucene/pull/13469#discussion_r1635215676 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java: ## @@ -217,6 +220,18 @@ public ByteVectorValues getByteVectorValues(String f

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-06-11 Thread via GitHub
navneet1v commented on code in PR #13469: URL: https://github.com/apache/lucene/pull/13469#discussion_r1635173335 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java: ## @@ -217,6 +220,18 @@ public ByteVectorValues getByteVectorValues(String

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-11 Thread via GitHub
benwtrent commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2161031174 @gsmiller I have ran into those weird recall numbers in scenarios before: - My vector data was corrupted and thus created many `0` valued vectors - My dimensions were incorre

Re: [PR] Fix global score update bug in MultiLeafKnnCollector [lucene]

2024-06-11 Thread via GitHub
gsmiller commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2161023588 @benwtrent ah, you're right. I only had a single segment. I played with making the write buffer really small but couldn't get more than one segment with that 100d enwiki dataset. I ran

Re: [PR] Mark COSINE VectorSimilarity function as deprecated [lucene]

2024-06-11 Thread via GitHub
Pulkitg64 commented on PR #13473: URL: https://github.com/apache/lucene/pull/13473#issuecomment-2161002119 Thanks @benwtrent for all the feedback and help. I will try to raise a followup PR to remove COSINE function if no one else beats me to it. -- This is an automated message from the

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-06-11 Thread via GitHub
benwtrent commented on code in PR #13469: URL: https://github.com/apache/lucene/pull/13469#discussion_r1635050674 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java: ## @@ -217,6 +220,18 @@ public ByteVectorValues getByteVectorValues(String

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-06-11 Thread via GitHub
msokolov commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2160948229 Would it make sense to provide a reference implementation factory method that creates a properly-configured threadpool, maybe using all available cores with whatever appropriate policie

Re: [I] Strange ConcurrentMergeScheduler behavior with intra-merge threads [lucene]

2024-06-11 Thread via GitHub
benwtrent commented on issue #13478: URL: https://github.com/apache/lucene/issues/13478#issuecomment-2160939426 >Parallel merging breaks these assumptions and could cause issues. Well, the assumptions are that its only accessed once. But now in parallel merging, it could be re-cached

Re: [I] Strange ConcurrentMergeScheduler behavior with intra-merge threads [lucene]

2024-06-11 Thread via GitHub
benwtrent commented on issue #13478: URL: https://github.com/apache/lucene/issues/13478#issuecomment-2160815181 I think I know the issue with the parallel merging. This only happens when we use a SortingCodecReader. The key issue is here: https://github.com/apache/lucene/commit/17c2

Re: [PR] Adjust assertion check to not throw an NPE [lucene]

2024-06-11 Thread via GitHub
benwtrent merged PR #13479: URL: https://github.com/apache/lucene/pull/13479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[PR] Adjust assertion check to not throw an NPE [lucene]

2024-06-11 Thread via GitHub
benwtrent opened a new pull request, #13479: URL: https://github.com/apache/lucene/pull/13479 It is possible when checking this assertion, that we could throw an NPE as `metaData.getSort()` can be `null`. Let's actually allow the assertions do their checks instead of throwing a scary

Re: [I] Strange ConcurrentMergeScheduler behavior with intra-merge threads [lucene]

2024-06-11 Thread via GitHub
benwtrent commented on issue #13478: URL: https://github.com/apache/lucene/issues/13478#issuecomment-2160620126 OK, the NPE in sort, I did some manual debugging via good ole `System.out.println`. This only happens in the assertion if the cache check is greater than 1, which does seem to hap

[I] Strange ConcurrentMergeScheduler behavior with intra-merge threads [lucene]

2024-06-11 Thread via GitHub
benwtrent opened a new issue, #13478: URL: https://github.com/apache/lucene/issues/13478 ### Description It was noticed that the CMS intra-merge behavior was not fully tested. In an effort to do this, a change to override when the intra-merge scheduler is used has been drafted. https

Re: [I] Null exception occured when click on Luke desktop browser button [lucene]

2024-06-11 Thread via GitHub
slow-J commented on issue #13345: URL: https://github.com/apache/lucene/issues/13345#issuecomment-2160422363 Hi @wuth, please raise a pull request with your fix and it can get reviewed and released. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Optimize Japanese UserDictionary. [lucene]

2024-06-11 Thread via GitHub
bruno-roustant merged PR #13431: URL: https://github.com/apache/lucene/pull/13431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@luc

Re: [PR] TaskExecutor should not fork unnecessarily [lucene]

2024-06-11 Thread via GitHub
javanna commented on PR #13472: URL: https://github.com/apache/lucene/pull/13472#issuecomment-2159970909 > Lucene should just make full use of the provided executor and that's that shouldn't it? Yes, I think so, but perhaps Lucene needs to provide general guidelines to users around w