Re: [PR] Revert "Add UnwrappingReuseStrategy for AnalyzerWrapper (#14154)" [lucene]

2025-04-02 Thread via GitHub
mayya-sharipova merged PR #14430: URL: https://github.com/apache/lucene/pull/14430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lu

[PR] Revert "Add UnwrappingReuseStrategy for AnalyzerWrapper (#14154)" [lucene]

2025-04-02 Thread via GitHub
mayya-sharipova opened a new pull request, #14432: URL: https://github.com/apache/lucene/pull/14432 This reverts commit 1a676b64f89069284bf7d0510162f8095bba3980. Revert based on https://github.com/apache/lucene/pull/14430 -- This is an automated message from the Apache Git Servi

Re: [PR] Adding profiling support for concurrent segment search [lucene]

2025-04-02 Thread via GitHub
jainankitk commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2774601593 > I'd have a top-level tree for everything related to initializing the search and combining results (rewrite(), createWeight(), CollectorManager#reduce) and then a list of trees for e

Re: [PR] Optimize ConcurrentMergeScheduler for Multi-Tenant Indexing [lucene]

2025-04-02 Thread via GitHub
github-actions[bot] commented on PR #14335: URL: https://github.com/apache/lucene/pull/14335#issuecomment-2774035527 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

[I] Add a timeout for forceMergeDeletes in IndexWriter [lucene]

2025-04-02 Thread via GitHub
houserjohn opened a new issue, #14431: URL: https://github.com/apache/lucene/issues/14431 ### Description Using IndexWriter's `forceMergeDeletes` to eliminate merge debt is a very useful feature -- especially during initial indexing. However, larger indexes can require 20+ minutes to

Re: [I] Incorrect use of fsync [lucene]

2025-04-02 Thread via GitHub
viliam-durina commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2772476115 >There has to be a certain trust in what the operating system provides and its consistency guarantees Sure, but the OS doesn't guarantee anything, if you don't fsync. I'

Re: [I] [DISCUSS] Could we have a different ANN algorithm for Learned Sparse Vectors? [lucene]

2025-04-02 Thread via GitHub
yuye-aws commented on issue #13675: URL: https://github.com/apache/lucene/issues/13675#issuecomment-2772606870 Hi @atris , thanks for your contribution. I found this paper (**Bridging Dense and Sparse Maximum Inner Product Search**) pretty interesting as it caters to the skip list st

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-04-02 Thread via GitHub
benwtrent commented on code in PR #14094: URL: https://github.com/apache/lucene/pull/14094#discussion_r2023396128 ## lucene/CHANGES.txt: ## @@ -139,6 +139,8 @@ New Features * GITHUB#14412: Allow skip cache factor to be updated dynamically. (Sagar Upadhyaya) +* GITHUB#14094

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-04-02 Thread via GitHub
tteofili merged PR #14094: URL: https://github.com/apache/lucene/pull/14094 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Revert "Add UnwrappingReuseStrategy for AnalyzerWrapper (#14154)" [lucene]

2025-04-02 Thread via GitHub
mayya-sharipova merged PR #14432: URL: https://github.com/apache/lucene/pull/14432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lu

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-02 Thread via GitHub
rmuir commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2774215695 I think the history is just that this norm can contain arbitrary value, which before was a suboptimal encoding into a single byte. There was a ValueSource that assumed it was a single byte

Re: [PR] Adding profiling support for concurrent segment search [lucene]

2025-04-02 Thread via GitHub
jainankitk commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2774610745 @jpountz - The code changes are ready for review. For now, I have made changes to accommodate all the timers in `QueryProfilerTimingType`. While this does not modify (`rewrite()

Re: [I] Incorrect use of fsync [lucene]

2025-04-02 Thread via GitHub
dweiss commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2772129998 There has to be a certain trust in what the operating system provides and its consistency guarantees. What you describe seems like a fringe case that - even if possible - falls under

Re: [PR] Disable the query cache by default. [lucene]

2025-04-02 Thread via GitHub
jpountz merged PR #14187: URL: https://github.com/apache/lucene/pull/14187 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Incorrect use of fsync [lucene]

2025-04-02 Thread via GitHub
uschindler commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771766515 If a file is incomplete, the commit will fail. Lucene's fileformats are designed in a way that corruption can be found early (this includes checksums). So zero byte temp files or

Re: [I] Incorrect use of fsync [lucene]

2025-04-02 Thread via GitHub
uschindler commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771774432 > For temporary files, we should either fsync before closing, or start reading without closing the file. This shows that you have no idea about how Lucene works internally

Re: [PR] Reduce the number of comparisons when lowerPoint is equal to upperPoint [lucene]

2025-04-02 Thread via GitHub
gsmiller commented on code in PR #14267: URL: https://github.com/apache/lucene/pull/14267#discussion_r2025592245 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -517,6 +623,11 @@ public byte[] getUpperPoint() { return upperPoint.clone(); }

[I] Monitor TermFilteredPresearcher does not return stored query if it contains filter field [lucene]

2025-04-02 Thread via GitHub
bjacobowitz opened a new issue, #14427: URL: https://github.com/apache/lucene/issues/14427 ### Description `TermFilteredPresearcher` may fail to return stored queries if those queries contain the filter field in the query itself (not just in the metadata). When building the pre

Re: [I] UnsupportedOperation when merging `Lucene90BlockTreeTermsWriter` [lucene]

2025-04-02 Thread via GitHub
mikemccand commented on issue #14429: URL: https://github.com/apache/lucene/issues/14429#issuecomment-2773775763 Phew, this is a spooky exception! I think it means that the same term was fed to the FST Builder twice in row. FST Builder in general can support this case, and it means t

Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

2025-04-02 Thread via GitHub
jainankitk commented on code in PR #14397: URL: https://github.com/apache/lucene/pull/14397#discussion_r2021747427 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java: ## @@ -512,6 +512,7 @@ private void doReset(int do

[PR] Revert "Add UnwrappingReuseStrategy for AnalyzerWrapper (#14154)" [lucene]

2025-04-02 Thread via GitHub
mayya-sharipova opened a new pull request, #14430: URL: https://github.com/apache/lucene/pull/14430 This reverts commit ce2a917cf2c2f40b3996656f3b294e3c01d25e5b. ### Description In Elasticsearch (and probably other applications) we reuse the same analyzer across fields. And thi

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-04-02 Thread via GitHub
kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2771809487 All dependent Faiss PRs are merged: 1. https://github.com/facebookresearch/faiss/pull/4158: Support pre-filtering on a Java `long[]` (underlying of `FixedBitSet`) using `IDSelectorBi

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-02 Thread via GitHub
ChrisHegarty commented on PR #14426: URL: https://github.com/apache/lucene/pull/14426#issuecomment-2773007104 Given the feedback so far, I've pivot this quite a bit to now include per-field metrics. To support this I removed the previously proposed OffHeapAccountable interface and put the a

Re: [I] Incorrect use of fsync [lucene]

2025-04-02 Thread via GitHub
viliam-durina commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771546918 There was no crash, that's the problem. You wrote a file, then opened it again for reading and it's corrupted, and there was no IO error reported. As I said, if you clo

[I] UnsupportedOperation when merging `Lucene90BlockTreeTermsWriter` [lucene]

2025-04-02 Thread via GitHub
benwtrent opened a new issue, #14429: URL: https://github.com/apache/lucene/issues/14429 ### Description Found this in the wild. I haven't been able to replicate :( I don't even know what it means to hit this `fst.outputs.merge` branch and under what conditions is it valid/inva

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-04-02 Thread via GitHub
benwtrent commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2773659852 @dungba88 @msokolov Do we know where we stand with this? I wonder if we should simply use this to replace the current buggy behavior as fanning out to segments multiple times w

Re: [I] Incorrect use of fsync [lucene]

2025-04-02 Thread via GitHub
uschindler commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771580012 > There was no crash, that's the problem. You wrote a file, then opened it again for reading and it's corrupted, and there was no IO error reported. > > As I said, if you c

Re: [I] Incorrect use of fsync [lucene]

2025-04-02 Thread via GitHub
dweiss commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771582280 Ok, sorry but the scenario you're describing is insane to me. If something like this happens, I don't think it's Lucene's duty to try to correct it - it seems like the entire system