Re: [I] Tool to recover data from .fdt files [LUCENE-4706] [lucene]

2025-01-21 Thread via GitHub
soheil-mohseni commented on issue #5771: URL: https://github.com/apache/lucene/issues/5771#issuecomment-2606513052 Hi The Link is not valid: https://github.com/OtherLevels/es_fdr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Fix TestBpVectorReorderer#testIndexReorderDense [lucene]

2025-01-21 Thread via GitHub
iverase merged PR #14153: URL: https://github.com/apache/lucene/pull/14153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] TestBpVectorReorderer.testIndexReorderDense failure in CI [lucene]

2025-01-21 Thread via GitHub
iverase closed issue #14143: TestBpVectorReorderer.testIndexReorderDense failure in CI URL: https://github.com/apache/lucene/issues/14143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Improve set deletions percentage javadoc [lucene]

2025-01-21 Thread via GitHub
yugushihuang commented on PR #12828: URL: https://github.com/apache/lucene/pull/12828#issuecomment-2606429556 @jpountz It looks like I do not have push access to the repo. Can you help me merge it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]

2025-01-21 Thread via GitHub
Tony-X commented on issue #14148: URL: https://github.com/apache/lucene/issues/14148#issuecomment-2606036321 `MergeScheduler` can reject or throttle certain merges but I wonder if it can somehow communicate the constraints to the `MergePolicy` to suggest MP to produce plausible but not mos

Re: [PR] SortedSet DV Multi Range query [lucene]

2025-01-21 Thread via GitHub
gsmiller commented on code in PR #13974: URL: https://github.com/apache/lucene/pull/13974#discussion_r1924528363 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/SortedSetMultiRangeQuery.java: ## @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Remove scoreAll() optimization from DefaultBulkScorer. [lucene]

2025-01-21 Thread via GitHub
github-actions[bot] commented on PR #14039: URL: https://github.com/apache/lucene/pull/14039#issuecomment-2606012962 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

2025-01-21 Thread via GitHub
jpountz commented on PR #14156: URL: https://github.com/apache/lucene/pull/14156#issuecomment-2605738891 > An outright madvise call should be about as expensive as the isLoaded check when things are already in the page cache The PR where `consecutivePrefetchHitCount` was introduced ha

Re: [PR] gh-14127: remove duplicate neighbors when writing HNSW graphs [lucene]

2025-01-21 Thread via GitHub
iverase commented on PR #14157: URL: https://github.com/apache/lucene/pull/14157#issuecomment-2605433717 I was thinking in a solution more like this: https://github.com/apache/lucene/pull/14159. Just open it for discussion, I am ok if the preferred solution is done at this level. The

[PR] Revert TestManyKnnDocs changes from #14084 [lucene]

2025-01-21 Thread via GitHub
benwtrent opened a new pull request, #14158: URL: https://github.com/apache/lucene/pull/14158 I didn't fully validate TestManyKnnDocs when merging #14084, this reverts the TestManyKnnDocs changes as it just broke. closes: https://github.com/apache/lucene/issues/14149 -- This is

Re: [I] TestBPReorderingMergePolicy fails CheckIndex.testHnswGraph [lucene]

2025-01-21 Thread via GitHub
iverase commented on issue #14127: URL: https://github.com/apache/lucene/issues/14127#issuecomment-2605372405 One possible fix would be to remove neighbours from c1 from the search result which implies an extra graph seek to collect the neighbours into the visited nodes. -- This is an a

Re: [I] TestBPReorderingMergePolicy fails CheckIndex.testHnswGraph [lucene]

2025-01-21 Thread via GitHub
msokolov commented on issue #14127: URL: https://github.com/apache/lucene/issues/14127#issuecomment-2605353284 The thing that complicates this is that the graph is directed - that is its links are not reciprocal always (although they mostly are), Therefore although it is true by constructio

Re: [I] TestBPReorderingMergePolicy fails CheckIndex.testHnswGraph [lucene]

2025-01-21 Thread via GitHub
iverase commented on issue #14127: URL: https://github.com/apache/lucene/issues/14127#issuecomment-2605324971 > Thanks for the test case @iverase, but I was able to reliably repro using the existing test case. These tests are pretty slow so I'd just as soon not add too many more, unless you

Re: [I] TestBPReorderingMergePolicy fails CheckIndex.testHnswGraph [lucene]

2025-01-21 Thread via GitHub
msokolov commented on issue #14127: URL: https://github.com/apache/lucene/issues/14127#issuecomment-2605301934 Thanks for the test case @iverase, but I was able to reliably repro using the existing test case. These tests are pretty slow so I'd just as soon not add too many more, unless you

[PR] gh-14127: remove duplicate neighbors when writing HNSW graphs [lucene]

2025-01-21 Thread via GitHub
msokolov opened a new pull request, #14157: URL: https://github.com/apache/lucene/pull/14157 should fix #gh-14127 test failures. I believe there are no back-compat concerns here since this would be a no-op for graphs with no duplicates (I think that is what we were always producing b

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2025-01-21 Thread via GitHub
RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2605229476 Or to elaborate more ```Searching across the N separate shards as if they were a single index is also possible via MultiReader``` will require separate Lucene indexes for differe

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

2025-01-21 Thread via GitHub
ChrisHegarty commented on PR #14156: URL: https://github.com/apache/lucene/pull/14156#issuecomment-2604854418 The overhead of `MS::isLoaded` is certainly not a good tradeoff here, as can be seen from the profile that you posted @original-brownbear. Given the new code, `consecutivePrefetchH

Re: [I] Make NativeUnixDirectory pure java now that direct IO is possible [LUCENE-8982] [lucene]

2025-01-21 Thread via GitHub
mikemccand commented on issue #10025: URL: https://github.com/apache/lucene/issues/10025#issuecomment-2604376572 > > Michael McCandless ([@mikemccand](https://github.com/mikemccand)) ([migrated from JIRA](https://issues.apache.org/jira/browse/LUCENE-8982?focusedCommentId=17223693&page=com.a

Re: [PR] Fix acceptOrds in EmptyOffHeapVectorValues to match no bits [lucene]

2025-01-21 Thread via GitHub
ChrisHegarty commented on code in PR #14119: URL: https://github.com/apache/lucene/pull/14119#discussion_r1923394010 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene92/OffHeapFloatVectorValues.java: ## @@ -256,7 +256,7 @@ public DocIndexIterator iterat

Re: [PR] Implement #intoBitSet on `IntArrayDocIdSet` and `RoaringDocIdSet`. [lucene]

2025-01-21 Thread via GitHub
jpountz merged PR #14135: URL: https://github.com/apache/lucene/pull/14135 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fix acceptOrds in EmptyOffHeapVectorValues to match no bits [lucene]

2025-01-21 Thread via GitHub
vigyasharma commented on code in PR #14119: URL: https://github.com/apache/lucene/pull/14119#discussion_r1909327166 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene92/OffHeapFloatVectorValues.java: ## @@ -256,7 +256,7 @@ public DocIndexIterator iterato

Re: [PR] Replace special-casing of `DocBaseBitSetIterator` with `#intoBitSet`. [lucene]

2025-01-21 Thread via GitHub
jpountz commented on PR #14139: URL: https://github.com/apache/lucene/pull/14139#issuecomment-2604172940 Thanks @ChrisHegarty -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Replace special-casing of `DocBaseBitSetIterator` with `#intoBitSet`. [lucene]

2025-01-21 Thread via GitHub
jpountz merged PR #14139: URL: https://github.com/apache/lucene/pull/14139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add small bias towards bit set encoding. [lucene]

2025-01-21 Thread via GitHub
jpountz commented on PR #14155: URL: https://github.com/apache/lucene/pull/14155#issuecomment-2604161380 luceneutil on wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

[PR] Add small bias towards bit set encoding. [lucene]

2025-01-21 Thread via GitHub
jpountz opened a new pull request, #14155: URL: https://github.com/apache/lucene/pull/14155 Currently, blocks of postings get encoded as a bit set instead of packed deltas (FOR) whenever the bit set is more storage-efficient. However, the bit set approach is quite more CPU-efficient at sear