Re: [I] Make HNSW merges cheaper on heap or at least expose heap usage estimate [lucene]

2025-02-10 Thread via GitHub
Vikasht34 commented on issue #14208: URL: https://github.com/apache/lucene/issues/14208#issuecomment-2649836148 @benwtrent here are my thoughts on questions asked **Entry point can be updated at any time (we need to think about this)** 1. Two-Pass Merging to Handle Entry Point

Re: [PR] Add nullability annotations to IndexSearcher APIs [lucene]

2025-02-10 Thread via GitHub
github-actions[bot] commented on PR #14132: URL: https://github.com/apache/lucene/pull/14132#issuecomment-2649549901 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-10 Thread via GitHub
jpountz commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1949994957 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Add UnwrappingReuseStrategy for AnalyzerWrapper [lucene]

2025-02-10 Thread via GitHub
jpountz commented on PR #14154: URL: https://github.com/apache/lucene/pull/14154#issuecomment-2649385716 I don't feel good about this change (at least not yet). It looks like there is an analyzer that changes its components over time somewhere, and this change aims at making sure that an an

Re: [PR] Support DataInput as source for StoredField [lucene]

2025-02-10 Thread via GitHub
Tim-Brooks commented on PR #14213: URL: https://github.com/apache/lucene/pull/14213#issuecomment-2649363258 > I'm curious if we should make it an actual record? Haha probably. I actually did not check Lucene's language level when producing the PR. I'll continue to refine this a

Re: [PR] Make Operations#union merge accept states that have no outgoing transition. [lucene]

2025-02-10 Thread via GitHub
jpountz merged PR #14207: URL: https://github.com/apache/lucene/pull/14207 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Support DataInput as source for StoredField [lucene]

2025-02-10 Thread via GitHub
jpountz commented on PR #14213: URL: https://github.com/apache/lucene/pull/14213#issuecomment-2649287798 > I went with this initial approach as it aligns with the fact that StoredFieldsWriter already supports DataInput (seemingly for merges). Indeed, the change that introduced this ca

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-10 Thread via GitHub
mayya-sharipova commented on code in PR #14215: URL: https://github.com/apache/lucene/pull/14215#discussion_r1949902014 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -52,7 +52,7 @@ public boolean collect(int docId, float similarity) { @Overrid

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-10 Thread via GitHub
mayya-sharipova commented on code in PR #14215: URL: https://github.com/apache/lucene/pull/14215#discussion_r1949902014 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -52,7 +52,7 @@ public boolean collect(int docId, float similarity) { @Overrid

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-10 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2649210893 Thanks for looking into it. Were you able to confirm that the difference with the variable count is indeed that auto-vectorization not getting enabled as opposed to something else such a

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-10 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1949856748 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java: ## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-10 Thread via GitHub
rmuir commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1949836434 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFolding.java: ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Reduce virtual calls when visiting bpv24-encoded doc ids in BKD leaves [lucene]

2025-02-10 Thread via GitHub
jpountz commented on PR #14176: URL: https://github.com/apache/lucene/pull/14176#issuecomment-2649151714 Amazing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Add posTagFormat parameter for OpenNLPPOSFilter [lucene]

2025-02-10 Thread via GitHub
msfroh commented on PR #14194: URL: https://github.com/apache/lucene/pull/14194#issuecomment-2648941930 > Test failure: > > ``` > Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.search.TestSeededKnnByteVectorQuery.testSeedWithTimeout" -Ptests.jvms=1 -Ptests.jv

Re: [PR] Unicode Support for Case Insensitive Matching in RegExp [lucene]

2025-02-10 Thread via GitHub
john-wagster commented on code in PR #14192: URL: https://github.com/apache/lucene/pull/14192#discussion_r1949691308 ## lucene/core/src/java/org/apache/lucene/util/automaton/CaseFoldingUtil.java: ## @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-10 Thread via GitHub
gsmiller commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1949338012 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Reduce virtual calls when visiting bpv24-encoded doc ids in BKD leaves [lucene]

2025-02-10 Thread via GitHub
iverase commented on PR #14176: URL: https://github.com/apache/lucene/pull/14176#issuecomment-2648427208 Incredible speeds ups here https://benchmarks.mikemccandless.com/FilteredIntNRQ.html and here https://benchmarks.mikemccandless.com/IntNRQ.html -- This is an automated message from t

Re: [PR] Add histogram facet capabilities. [lucene]

2025-02-10 Thread via GitHub
gsmiller commented on code in PR #14204: URL: https://github.com/apache/lucene/pull/14204#discussion_r1949346251 ## lucene/facet/src/java/org/apache/lucene/facet/histogram/HistogramCollector.java: ## @@ -0,0 +1,252 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Fix Operations.reverse() to not add non-deterministic dead states [lucene]

2025-02-10 Thread via GitHub
rmuir merged PR #14212: URL: https://github.com/apache/lucene/pull/14212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-10 Thread via GitHub
iverase commented on code in PR #14215: URL: https://github.com/apache/lucene/pull/14215#discussion_r1949247657 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -216,7 +216,7 @@ void searchLevel( while (candidates.size() > 0 && results.early

Re: [PR] Bugfix/fix hnsw search termination check [lucene]

2025-02-10 Thread via GitHub
iverase commented on code in PR #14215: URL: https://github.com/apache/lucene/pull/14215#discussion_r1949247657 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -216,7 +216,7 @@ void searchLevel( while (candidates.size() > 0 && results.early

Re: [PR] [Draft] Support Multi-Vector HNSW Search via Flat Vector Storage [lucene]

2025-02-10 Thread via GitHub
benwtrent commented on PR #14173: URL: https://github.com/apache/lucene/pull/14173#issuecomment-2647965600 > I meant that since we'd be writing a new implementations for buildGraph etc, merging etc, it might be easier to account for long nodeIds from the get go Ah, I understand and I

Re: [I] Make HNSW merges cheaper on heap or at least expose heap usage estimate [lucene]

2025-02-10 Thread via GitHub
benwtrent commented on issue #14208: URL: https://github.com/apache/lucene/issues/14208#issuecomment-2647948921 > I think it deserve a separate github issue. WDYT? If it can provide speed improvements for sure, I agree. But keep in mind that when merging that: - Entry point c