Re: [PR] Make FSTPostingFormat to build FST off-heap [lucene]

2024-03-14 Thread via GitHub
github-actions[bot] commented on PR #12980: URL: https://github.com/apache/lucene/pull/12980#issuecomment-1998682382 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] TestIDVersionPostingsFormat failure [lucene]

2024-03-14 Thread via GitHub
benwtrent commented on issue #13127: URL: https://github.com/apache/lucene/issues/13127#issuecomment-1998480248 More investigation is needed. The only other method that updates `DWDQ#nextSeqNo` is `DWDQ#skipSequenceNumbers(long)`. The only place that `DWDQ#skipSequenceNumbers(long)` i

Re: [I] TestIDVersionPostingsFormat failure [lucene]

2024-03-14 Thread via GitHub
benwtrent commented on issue #13127: URL: https://github.com/apache/lucene/issues/13127#issuecomment-1998402170 Well, that race-condition wasn't the cause. I have seen another failure. ``` ./gradlew test --tests TestIDVersionPostingsFormat.testGlobalVersions -Dtests.seed=DEC45C861B1BCF

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-14 Thread via GitHub
benwtrent merged PR #13124: URL: https://github.com/apache/lucene/pull/13124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Make the HitQueue size more appropriate for KNN exact search [lucene]

2024-03-14 Thread via GitHub
benwtrent commented on code in PR #13184: URL: https://github.com/apache/lucene/pull/13184#discussion_r1525341872 ## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenFloatKnnVectorQuery.java: ## @@ -98,7 +98,8 @@ protected TopDocs exactSearch(LeafReaderCont

Re: [PR] Change BP reordering logic to help support document blocks later on. [lucene]

2024-03-14 Thread via GitHub
rishabhmaurya commented on code in PR #13123: URL: https://github.com/apache/lucene/pull/13123#discussion_r1525325740 ## lucene/misc/src/java/org/apache/lucene/misc/index/BPIndexReorderer.java: ## @@ -341,116 +344,94 @@ protected void compute() { */ private boolean sh

[PR] Avoid iterations: cooling using simulated annealing [lucene]

2024-03-14 Thread via GitHub
rishabhmaurya opened a new pull request, #13186: URL: https://github.com/apache/lucene/pull/13186 ### Description As described in the paper (Tradeoff Options for Bipartite Graph Partitioning), simulated annealing-type mechanism be employed to reduce number of swaps with each iteration. I

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-14 Thread via GitHub
zhaih commented on code in PR #13124: URL: https://github.com/apache/lucene/pull/13124#discussion_r1525307991 ## lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java: ## @@ -902,12 +932,52 @@ private static String getSegmentName(MergePolicy.OneMerge merge)

Re: [PR] Support disabling IndexSearcher.maxClauseCount with a value of -1 [lucene]

2024-03-14 Thread via GitHub
dsmiley commented on PR #13178: URL: https://github.com/apache/lucene/pull/13178#issuecomment-1998018450 Problem: clarity in being able to turn it off; a -1 value would clarify the configuration intent. It has become even less useful, not that I quite wanted it in the first place in the pa

Re: [PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-03-14 Thread via GitHub
benwtrent commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1525282411 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -190,12 +190,13 @@ protected TopDocs exactSearch(LeafReaderContext context, DocI

Re: [I] Avoid recalculating the norm of the target vector when using cosine metric [lucene]

2024-03-14 Thread via GitHub
benwtrent commented on issue #13185: URL: https://github.com/apache/lucene/issues/13185#issuecomment-1997999092 I would rather not change anything related to this enumeration until we figure out: https://github.com/apache/lucene/issues/13182 As an aside, I think cosine as a metric is

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-14 Thread via GitHub
jpountz commented on code in PR #13124: URL: https://github.com/apache/lucene/pull/13124#discussion_r1525270702 ## lucene/core/src/java/org/apache/lucene/index/SegmentMerger.java: ## @@ -130,19 +135,31 @@ MergeState merge() throws IOException { IOContext.READ,

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-14 Thread via GitHub
benwtrent commented on code in PR #13124: URL: https://github.com/apache/lucene/pull/13124#discussion_r1525266457 ## lucene/core/src/java/org/apache/lucene/index/SegmentMerger.java: ## @@ -130,19 +135,31 @@ MergeState merge() throws IOException { IOContext.READ,

[I] Avoid recalculating the norm of the target vector when using cosine metric [lucene]

2024-03-14 Thread via GitHub
bugmakerr opened a new issue, #13185: URL: https://github.com/apache/lucene/issues/13185 ### Description Currently, in the KNN retrieval process, we use `VectorSimilarityFunction#compare` to calculate the score between the target vector and the current vector. This method require

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-14 Thread via GitHub
jpountz commented on code in PR #13124: URL: https://github.com/apache/lucene/pull/13124#discussion_r1525217090 ## lucene/core/src/java/org/apache/lucene/index/SegmentMerger.java: ## @@ -130,19 +135,31 @@ MergeState merge() throws IOException { IOContext.READ,

Re: [PR] Add new VectorScorer interface to vector value iterators [lucene]

2024-03-14 Thread via GitHub
mccullocht commented on code in PR #13181: URL: https://github.com/apache/lucene/pull/13181#discussion_r1525162554 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -190,12 +190,13 @@ protected TopDocs exactSearch(LeafReaderContext context, Doc

[PR] Make the HitQueue size more appropriate for KNN exact search [lucene]

2024-03-14 Thread via GitHub
bugmakerr opened a new pull request, #13184: URL: https://github.com/apache/lucene/pull/13184 ### Description Currently, when performing KNN exact search, we consistently set the HitQueue size to `k`. However, there may be instances where the number of candidates is actually lower th

Re: [PR] gh-13147: use dense bit-encoding for frequent terms [lucene]

2024-03-14 Thread via GitHub
msokolov commented on PR #13153: URL: https://github.com/apache/lucene/pull/13153#issuecomment-1997699021 It seems to especially make phrase and span queries slower? Possibly the decoding of positions is not good? I could try restricting to fields without positions (which seem likely to be

Re: [PR] Change BP reordering logic to help support document blocks later on. [lucene]

2024-03-14 Thread via GitHub
jpountz commented on code in PR #13123: URL: https://github.com/apache/lucene/pull/13123#discussion_r1525040811 ## lucene/misc/src/java/org/apache/lucene/misc/index/BPIndexReorderer.java: ## @@ -341,116 +344,94 @@ protected void compute() { */ private boolean shuffle(

Re: [PR] gh-13147: use dense bit-encoding for frequent terms [lucene]

2024-03-14 Thread via GitHub
msokolov commented on PR #13153: URL: https://github.com/apache/lucene/pull/13153#issuecomment-1997659513 right that last change seemed promising! But on luceneutil tasks it didn't show much impact, basically a regression to teh mean compared to the first revision I tested; possibly a sligh

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-14 Thread via GitHub
jpountz commented on PR #13149: URL: https://github.com/apache/lucene/pull/13149#issuecomment-1997568632 ++ on progress over perfection That said, I wonder if this change is legal: `DocIdSetIterator` must return doc IDs in order, but it looks like it wouldn't always be the case with y

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

2024-03-14 Thread via GitHub
stefanvodita commented on PR #12966: URL: https://github.com/apache/lucene/pull/12966#issuecomment-1997515955 @gsmiller - I know you may not have time to review, but I want to at least notify you, since this is a big change and you've been very invovled in this area of the code. -- This

Re: [PR] Replace Collections.synchronizedSet() with ConcurrentHashMap.newKeySet() [lucene]

2024-03-14 Thread via GitHub
uschindler commented on PR #13142: URL: https://github.com/apache/lucene/pull/13142#issuecomment-1997212215 Hi, I am fine to apply the "safe" changes where iteration or an atomic add+remove is not required. But all others should be reverted and external synchronization using synchroni

Re: [PR] Replace Collections.synchronizedSet() with ConcurrentHashMap.newKeySet() [lucene]

2024-03-14 Thread via GitHub
uschindler commented on code in PR #13142: URL: https://github.com/apache/lucene/pull/13142#discussion_r1524663419 ## lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java: ## @@ -158,10 +158,7 @@ public long getPrimaryGen() { */ public boolean flus

Re: [PR] Replace Collections.synchronizedSet() with ConcurrentHashMap.newKeySet() [lucene]

2024-03-14 Thread via GitHub
uschindler commented on code in PR #13142: URL: https://github.com/apache/lucene/pull/13142#discussion_r1524661584 ## lucene/core/src/java/org/apache/lucene/store/TrackingDirectoryWrapper.java: ## @@ -61,10 +61,8 @@ public void copyFrom(Directory from, String src, String dest,

Re: [PR] Replace Collections.synchronizedSet() with ConcurrentHashMap.newKeySet() [lucene]

2024-03-14 Thread via GitHub
benwtrent commented on code in PR #13142: URL: https://github.com/apache/lucene/pull/13142#discussion_r1524632573 ## lucene/core/src/java/org/apache/lucene/store/TrackingDirectoryWrapper.java: ## @@ -61,10 +61,8 @@ public void copyFrom(Directory from, String src, String dest, I

[PR] Fix TestIndexWriter.testDeleteUnusedFiles failure on Windows 11 [lucene]

2024-03-14 Thread via GitHub
vsop-479 opened a new pull request, #13183: URL: https://github.com/apache/lucene/pull/13183 Fix https://github.com/apache/lucene/issues/12524 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Replace Collections.synchronizedSet() with ConcurrentHashMap.newKeySet() [lucene]

2024-03-14 Thread via GitHub
dweiss commented on PR #13142: URL: https://github.com/apache/lucene/pull/13142#issuecomment-1996843277 I'm out of office this week. If anybody can pick this up, please do. Otherwise I'll return to it next wek. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Remove halt() call in TestSimpleServer (part of TestStressNRTReplication [lucene]

2024-03-14 Thread via GitHub
dweiss commented on PR #13177: URL: https://github.com/apache/lucene/pull/13177#issuecomment-1996840513 I think it's in the method's documentation (current()) that the returned handle can't be used to stop yourself. Indeed, I tried it too. ;) -- This is an automated message from the Apach