date:20250122

[PR] supports force merge based on specified segments. [lucene]

2025-01-22 Thread via GitHub

cheng66551 opened a new pull request, #14163: URL: https://github.com/apache/lucene/pull/14163 In version 7.6.0 of ElasticSearch, I found through /_cat/segments that the docs.deleted count of many segments was continuously increasing, but over time, **these deleted documents were never auto

Re: [PR] feat: Added the method `forceMergeBySegmentNames` in IW, which suppor… [lucene]

2025-01-22 Thread via GitHub

cheng66551 closed pull request #14162: feat: Added the method `forceMergeBySegmentNames` in IW, which suppor… URL: https://github.com/apache/lucene/pull/14162 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] feat: Added the method `forceMergeBySegmentNames` in IW, which suppor… [lucene]

2025-01-22 Thread via GitHub

cheng66551 opened a new pull request, #14162: URL: https://github.com/apache/lucene/pull/14162 In version 7.6.0 of ElasticSearch, I found through /_cat/segments that the docs.deleted count of many segments was continuously increasing, but over time, **these deleted documents were never auto

[I] UnsupportedOperationException instead of IllegalArgumentException from PointInSetQuery when values are out of order [lucene]

2025-01-22 Thread via GitHub

jhinch-at-atlassian-com opened a new issue, #14161: URL: https://github.com/apache/lucene/issues/14161 ### Description PointInSetQuery in its constructor will check if the values provided to it are in order and if not will attempt to throw an exception: ``` throw n

Re: [PR] SortedSet DV Multi Range query [lucene]

2025-01-22 Thread via GitHub

mkhludnev commented on code in PR #13974: URL: https://github.com/apache/lucene/pull/13974#discussion_r1926069638 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/SortedSetMultiRangeQuery.java: ## @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2025-01-22 Thread via GitHub

vigyasharma commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2608302325 > Having a Multi-Reader on all the child log-group directories still won't provide a unified view of all group level segments associated with a Lucene Index. Even now, OpenSearc

[PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

2025-01-22 Thread via GitHub

benwtrent opened a new pull request, #14160: URL: https://github.com/apache/lucene/pull/14160 This is a continuation and completion of the work started by @benchaplin in https://github.com/apache/lucene/pull/14085 The algorithm is fairly simple: - Only score and then explore v

Re: [PR] update privacy policy link [lucene-site]

2025-01-22 Thread via GitHub

rmuir commented on code in PR #77: URL: https://github.com/apache/lucene-site/pull/77#discussion_r1925577292 ## content/pages/privacy.md: ## @@ -1,7 +0,0 @@ -Title: Privacy Policy -URL: privacy.html -save_as: privacy.html -template: lucene/tlp/page - Review Comment: personal

Re: [PR] Add WrappingReuseStrategy for AnalyzerWrapper [lucene]

2025-01-22 Thread via GitHub

jpountz commented on PR #14154: URL: https://github.com/apache/lucene/pull/14154#issuecomment-2607429171 I don't like that `CompletionAnalyzer` needs to track a thread-local, the point of reuse strategy is to avoid this kind of thing. Also I'm not sure I understand why `CompletionAnalyzer`

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

2025-01-22 Thread via GitHub

jpountz commented on PR #14156: URL: https://github.com/apache/lucene/pull/14156#issuecomment-2607399239 > Seems we just trade an isLoaded for an madvise on systems with enough memory? This is correct. I made this suggestion because it was similar to your initial proposal: skipping t

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

2025-01-22 Thread via GitHub

original-brownbear commented on PR #14156: URL: https://github.com/apache/lucene/pull/14156#issuecomment-2607332400 @jpountz I see. Hmm I wonder how much that saves us? Seems we just trade an `isLoaded` for an `madvise` on systems with enough memory? That said, maybe the `madvise` is far c

Re: [PR] Add WrappingReuseStrategy for AnalyzerWrapper [lucene]

2025-01-22 Thread via GitHub

benwtrent commented on code in PR #14154: URL: https://github.com/apache/lucene/pull/14154#discussion_r1925341296 ## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/CompletionAnalyzer.java: ## @@ -112,6 +116,25 @@ public CompletionAnalyzer( Concatenate

Re: [PR] gh-14127: remove duplicate neighbors when writing HNSW graphs [lucene]

2025-01-22 Thread via GitHub

msokolov merged PR #14157: URL: https://github.com/apache/lucene/pull/14157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] move MultiLeafKnnCollector to decorator and remove unnecessary code [lucene]

2025-01-22 Thread via GitHub

benwtrent merged PR #14147: URL: https://github.com/apache/lucene/pull/14147 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

2025-01-22 Thread via GitHub

gmarsay commented on PR #13343: URL: https://github.com/apache/lucene/pull/13343#issuecomment-2607247416 I also noticed a performance issue, maybe related to this topic? I have an index that contains data from a metricbeat agent (1 shard + 1 replica; 18G). When performing a search

Re: [PR] move MultiLeafKnnCollector to decorator and remove unnecessary code [lucene]

2025-01-22 Thread via GitHub

benwtrent commented on code in PR #14147: URL: https://github.com/apache/lucene/pull/14147#discussion_r1925317415 ## lucene/core/src/java/org/apache/lucene/search/knn/MultiLeafKnnCollector.java: ## @@ -77,6 +76,7 @@ public MultiLeafKnnCollector( int interval, Block

Re: [I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]

2025-01-22 Thread via GitHub

jpountz commented on issue #14148: URL: https://github.com/apache/lucene/issues/14148#issuecomment-2607239803 Intuitively, I had thought of the "throttle at start" approach, where we would also give `MS` the ability to filter out some merges from `MP` (so that they don't get registered to t

Re: [I] TestManyKnnDocs is broken [lucene]

2025-01-22 Thread via GitHub

benwtrent closed issue #14149: TestManyKnnDocs is broken URL: https://github.com/apache/lucene/issues/14149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]

2025-01-22 Thread via GitHub

mikemccand commented on issue #14148: URL: https://github.com/apache/lucene/issues/14148#issuecomment-2607224591 Doing this in `MergeScheduler` (`MS`) is indeed another option. It'd mean you could cap replication bandwidth independent of your `MergePolicy` (`MP`). `MS` could even fine-tun

Re: [PR] Revert TestManyKnnDocs changes from #14084 [lucene]

2025-01-22 Thread via GitHub

benwtrent merged PR #14158: URL: https://github.com/apache/lucene/pull/14158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

2025-01-22 Thread via GitHub

jpountz commented on PR #14156: URL: https://github.com/apache/lucene/pull/14156#issuecomment-2607163688 Well, you may be right as well that the cost of `MS::isLoaded` is of a similar order of magnitude as `madvise`. What the current logic does is that if you get `MS::isLoaded` to frequentl

Re: [PR] gh-14127: remove duplicate neighbors when writing HNSW graphs [lucene]

2025-01-22 Thread via GitHub

iverase commented on PR #14157: URL: https://github.com/apache/lucene/pull/14157#issuecomment-2607140645 Sounds good to me @msokolov, I didn't like to add yet a new parameter in the search api. Thanks for taking the time to review it. -- This is an automated message from the Apache Git

Re: [PR] Prevent choosing connection nodes that are already neighbours [lucene]

2025-01-22 Thread via GitHub

iverase closed pull request #14159: Prevent choosing connection nodes that are already neighbours URL: https://github.com/apache/lucene/pull/14159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] gh-14127: remove duplicate neighbors when writing HNSW graphs [lucene]

2025-01-22 Thread via GitHub

msokolov commented on PR #14157: URL: https://github.com/apache/lucene/pull/14157#issuecomment-2607133416 @iverase I see what you did there ... that would also solve this problem, but I think it is less desirable since it (1) requires extending the HNSW search API in a way I think we wouldn

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

2025-01-22 Thread via GitHub

original-brownbear commented on PR #14156: URL: https://github.com/apache/lucene/pull/14156#issuecomment-2607056900 @jpountz > was introduced had a benchmark that demonstrated an improvement with the current logic Huh those results are quite unexpected I must admit :) When me

Re: [PR] Improve set deletions percentage javadoc [lucene]

2025-01-22 Thread via GitHub

msokolov merged PR #12828: URL: https://github.com/apache/lucene/pull/12828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [I] Tool to recover data from .fdt files [LUCENE-4706] [lucene]

2025-01-22 Thread via GitHub

msokolov commented on issue #5771: URL: https://github.com/apache/lucene/issues/5771#issuecomment-2607027055 Thanks for pointing that out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Tool to recover data from .fdt files [LUCENE-4706] [lucene]

2025-01-22 Thread via GitHub

msokolov closed issue #5771: Tool to recover data from .fdt files [LUCENE-4706] URL: https://github.com/apache/lucene/issues/5771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] update privacy policy link [lucene-site]

2025-01-22 Thread via GitHub

cpoerschke commented on code in PR #77: URL: https://github.com/apache/lucene-site/pull/77#discussion_r1925064849 ## content/pages/privacy.md: ## @@ -1,7 +0,0 @@ -Title: Privacy Policy -URL: privacy.html -save_as: privacy.html -template: lucene/tlp/page - Review Comment: Alt

[PR] update privacy policy link [lucene-site]

2025-01-22 Thread via GitHub

cpoerschke opened a new pull request, #77: URL: https://github.com/apache/lucene-site/pull/77 The "Apache Project Website Checks" at https://whimsy.apache.org/site/project/lucene identify ``` Privacy | https://lucene.apache.org/privacy.html | URL expected to match regular expr

[PR] supports force merge based on specified segments. [lucene]

Re: [PR] feat: Added the method `forceMergeBySegmentNames` in IW, which suppor… [lucene]

[PR] feat: Added the method `forceMergeBySegmentNames` in IW, which suppor… [lucene]

[I] UnsupportedOperationException instead of IllegalArgumentException from PointInSetQuery when values are out of order [lucene]

Re: [PR] SortedSet DV Multi Range query [lucene]

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

[PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

Re: [PR] update privacy policy link [lucene-site]

Re: [PR] Add WrappingReuseStrategy for AnalyzerWrapper [lucene]

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

Re: [PR] Add WrappingReuseStrategy for AnalyzerWrapper [lucene]

Re: [PR] gh-14127: remove duplicate neighbors when writing HNSW graphs [lucene]

Re: [PR] move MultiLeafKnnCollector to decorator and remove unnecessary code [lucene]

Re: [PR] Advoid the use of ImpactsDISI when no minimum competitive score has been set [lucene]

Re: [PR] move MultiLeafKnnCollector to decorator and remove unnecessary code [lucene]

Re: [I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]

Re: [I] TestManyKnnDocs is broken [lucene]

Re: [I] Add an optional bandwidth cap to `TieredMergePolicy`? [lucene]

Re: [PR] Revert TestManyKnnDocs changes from #14084 [lucene]

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

Re: [PR] gh-14127: remove duplicate neighbors when writing HNSW graphs [lucene]

Re: [PR] Prevent choosing connection nodes that are already neighbours [lucene]

Re: [PR] gh-14127: remove duplicate neighbors when writing HNSW graphs [lucene]

Re: [PR] Remove mmap isLoaded check before madvise [lucene]

Re: [PR] Improve set deletions percentage javadoc [lucene]

Re: [I] Tool to recover data from .fdt files [LUCENE-4706] [lucene]

Re: [I] Tool to recover data from .fdt files [LUCENE-4706] [lucene]

Re: [PR] update privacy policy link [lucene-site]

[PR] update privacy policy link [lucene-site]

30 matches

Site Navigation

Mail list logo

Footer information