Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-04 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2028456102 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -130,4 +134,56 @@ public KnnVectorsReader getMergeInstance() { * The default imp

Re: [PR] Adding profiling support for concurrent segment search [lucene]

2025-04-04 Thread via GitHub
jpountz commented on code in PR #14413: URL: https://github.com/apache/lucene/pull/14413#discussion_r2028458611 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryProfilerBreakdown.java: ## @@ -17,46 +17,113 @@ package org.apache.lucene.sandbox.search; +import

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-04 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2028417528 ## lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsReader.java: ## @@ -257,6 +259,19 @@ public long ramBytesUsed() { r

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-04 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2028456102 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -130,4 +134,56 @@ public KnnVectorsReader getMergeInstance() { * The default imp

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-04 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2028541365 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -130,4 +134,56 @@ public KnnVectorsReader getMergeInstance() { * The default imp

Re: [I] QueryParser parsing a phrase with a wildcard [lucene]

2025-04-04 Thread via GitHub
mkhludnev commented on issue #14440: URL: https://github.com/apache/lucene/issues/14440#issuecomment-2778755848 Just a gentle reminder about `ComplexPhraseQueryParser` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] KeywordField.newSetQuery() to reuse prefixed terms in IndexOrDocValuesQuery [lucene]

2025-04-04 Thread via GitHub
jainankitk commented on code in PR #14435: URL: https://github.com/apache/lucene/pull/14435#discussion_r2028202175 ## lucene/core/src/java/org/apache/lucene/document/KeywordField.java: ## @@ -175,9 +174,8 @@ public static Query newExactQuery(String field, String value) { pub

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-04 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2778653535 I'd expect a hypothetical `IndexReaderFunctions.numTerms(field)` to return the number of terms in the index for that field. That's not even close to what we want! "Length" should be a

[I] [Release Wizard] Move choice for signing the release to when is needed [lucene]

2025-04-04 Thread via GitHub
iverase opened a new issue, #14441: URL: https://github.com/apache/lucene/issues/14441 The release wizard ask for in `Prerequisites/GPG key id is configured` for the method for signing the release: ``` Q: Do you want to sign the release with gradle plugin? No means gpg (y/n):

Re: [PR] KeywordField.newSetQuery() to reuse prefixed terms in IndexOrDocValuesQuery [lucene]

2025-04-04 Thread via GitHub
jpountz commented on code in PR #14435: URL: https://github.com/apache/lucene/pull/14435#discussion_r2028403966 ## lucene/core/src/java/org/apache/lucene/document/KeywordField.java: ## @@ -175,9 +174,8 @@ public static Query newExactQuery(String field, String value) { public

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-04 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2029105175 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -130,4 +134,56 @@ public KnnVectorsReader getMergeInstance() { * The default imp

[PR] Adding logic for collecting Histogram efficiently using Point Trees [lucene]

2025-04-04 Thread via GitHub
jainankitk opened a new pull request, #14439: URL: https://github.com/apache/lucene/pull/14439 ### Description This PR adds multi range traversal logic to collect the histogram on numeric field indexed as pointValues for MATCH_ALL cases. Even for non-match all cases like `PointRangeQ

Re: [PR] Let Decompressor implement the Closeable interface. [lucene]

2025-04-04 Thread via GitHub
mulugetam commented on PR #14438: URL: https://github.com/apache/lucene/pull/14438#issuecomment-2779253165 > Unfortunately, you can't easily use close() to release resources from a Decompressor, because `StoredFieldsReader` is cloneable, and close() is never called on the clones. The only w

Re: [PR] Let Decompressor implement the Closeable interface. [lucene]

2025-04-04 Thread via GitHub
mulugetam closed pull request #14438: Let Decompressor implement the Closeable interface. URL: https://github.com/apache/lucene/pull/14438 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Add a timeout for forceMergeDeletes in IndexWriter [lucene]

2025-04-04 Thread via GitHub
houserjohn commented on issue #14431: URL: https://github.com/apache/lucene/issues/14431#issuecomment-2779350159 Apologies if I am misunderstanding your question, but the example that it is great for is right after the full indexing of your documents. The indexing likely created many delete

Re: [PR] Remove slices creation overhead from IndexSearcher constructor [lucene]

2025-04-04 Thread via GitHub
jainankitk commented on code in PR #14428: URL: https://github.com/apache/lucene/pull/14428#discussion_r2026306457 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -223,12 +223,6 @@ public IndexSearcher(IndexReaderContext context, Executor executor) {

Re: [PR] [Draft] Support Multi-Vector HNSW Search via Flat Vector Storage [lucene]

2025-04-04 Thread via GitHub
vigyasharma commented on PR #14173: URL: https://github.com/apache/lucene/pull/14173#issuecomment-2744585265 re: using long for graph node ids, I can see how using int ordinals can be limiting for the no. of vectors we can index per segment. However, adapting to long node ids is also a non-

Re: [I] TestIndexSortBackwardsCompatibility.testSortedIndexAddDocBlocks fails reproducibly [lucene]

2025-04-04 Thread via GitHub
dweiss commented on issue #14344: URL: https://github.com/apache/lucene/issues/14344#issuecomment-2776875359 This has been fixed. Closing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Incorrect use of fsync [lucene]

2025-04-04 Thread via GitHub
viliam-durina commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2770735674 Without fsync, there's no guarantee that anything you wrote was written. The OS delays the writes. If you close, the data is still in memory, the OS later tries to write and f

Re: [PR] Disable HNSW connectedComponents (#14214) [lucene]

2025-04-04 Thread via GitHub
benwtrent merged PR #14436: URL: https://github.com/apache/lucene/pull/14436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] Add more information to IOContext [lucene]

2025-04-04 Thread via GitHub
rmuir commented on issue #14422: URL: https://github.com/apache/lucene/issues/14422#issuecomment-2767743265 > [@rmuir](https://github.com/rmuir) I'm curious if you can expand a bit more on what you have in mind, what you are describing sounds to me like how things are today where `ReadAdvic

Re: [PR] Fix HistogramCollector to not create zero-count buckets. [lucene]

2025-04-04 Thread via GitHub
jainankitk commented on code in PR #14421: URL: https://github.com/apache/lucene/pull/14421#discussion_r2020439736 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java: ## @@ -279,7 +279,9 @@ public void collect(DocIdStream stream)

Re: [PR] Add leafReaders() Method to IndexReader and Unit Test [lucene]

2025-04-04 Thread via GitHub
benwtrent commented on PR #14370: URL: https://github.com/apache/lucene/pull/14370#issuecomment-2736659338 Eh, I am not sold that this change needs to occur if ever. While, "this is how its always been" isn't a good argument for some things, I think expanding the public, and then backwards

Re: [PR] Fix test delta in minMaxScalarQuantize [lucene]

2025-04-04 Thread via GitHub
benwtrent merged PR #14403: URL: https://github.com/apache/lucene/pull/14403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-04-04 Thread via GitHub
guojialiang92 commented on PR #14417: URL: https://github.com/apache/lucene/pull/14417#issuecomment-2766081605 Thanks, @vigyasharma > I think we can add a couple more tests to make it robust. > > 1. Some tests around concurrency – index with multiple threads, then advance the cou

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-04 Thread via GitHub
jimczi commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2027537161 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -130,4 +134,56 @@ public KnnVectorsReader getMergeInstance() { * The default implement

Re: [I] Add a timeout for forceMergeDeletes in IndexWriter [lucene]

2025-04-04 Thread via GitHub
jpountz commented on issue #14431: URL: https://github.com/apache/lucene/issues/14431#issuecomment-2778108878 For my understanding, what is the benefit of waiting until the timeout is reached rather than not waiting at all? -- This is an automated message from the Apache Git Service. To r

Re: [I] build support: java 24 [lucene]

2025-04-04 Thread via GitHub
rmuir commented on issue #14379: URL: https://github.com/apache/lucene/issues/14379#issuecomment-2740946896 maybe this one will fix it long-term and this is the last time we have to go thru it? unfortunately it just went live in java 24 so it doesn't help us now: https://openjdk.org/jeps/48

Re: [PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]

2025-04-04 Thread via GitHub
vsop-479 commented on PR #14371: URL: https://github.com/apache/lucene/pull/14371#issuecomment-2774626745 @gf2121 , I measured it on a linux server (`uses preferredBitSize=512; FMA enabled`), there is still a massive slowndown. I will dig more ... ``` Benchmark

Re: [I] Incorrect use of fsync [lucene]

2025-04-04 Thread via GitHub
viliam-durina commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771736474 >every application everywhere and everytime would need to call fsync on close Yeah, I don't know what would be the use case for not fsyncing before closing. Maybe if you

Re: [PR] Adding profiling support for concurrent segment search [lucene]

2025-04-04 Thread via GitHub
jainankitk commented on code in PR #14413: URL: https://github.com/apache/lucene/pull/14413#discussion_r2029297568 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryProfilerBreakdown.java: ## @@ -17,46 +17,113 @@ package org.apache.lucene.sandbox.search; +imp

Re: [PR] Completion FSTs to be loaded off-heap at all times [lucene]

2025-04-04 Thread via GitHub
jpountz commented on PR #14364: URL: https://github.com/apache/lucene/pull/14364#issuecomment-2738217161 Agreed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-04-04 Thread via GitHub
jpountz commented on PR #14401: URL: https://github.com/apache/lucene/pull/14401#issuecomment-2766473986 It is unexpected indeed! I'll fix this and add a CHANGES entry. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-04-04 Thread via GitHub
sgup432 commented on PR #14412: URL: https://github.com/apache/lucene/pull/14412#issuecomment-2766561573 @jpountz Thanks for expanding on the reasoning above w.r.t. query cache usage. I was working on [refactoring](https://github.com/apache/lucene/issues/14222) query cache so that it is no

Re: [I] Address gradle temp file pollution insanity [lucene]

2025-04-04 Thread via GitHub
dweiss closed issue #14385: Address gradle temp file pollution insanity URL: https://github.com/apache/lucene/issues/14385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-04-04 Thread via GitHub
jpountz commented on PR #14412: URL: https://github.com/apache/lucene/pull/14412#issuecomment-2765833637 Sorry I'm not sure I get your suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Incorrect use of fsync [lucene]

2025-04-04 Thread via GitHub
viliam-durina commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771963385 >Please stop arguing here about problems that don't exist. Issue https://github.com/apache/lucene/issues/10906 has nothing to do with temporary files. This issue is not

Re: [PR] KeywordField.newSetQuery() to reuse prefixed terms in IndexOrDocValuesQuery [lucene]

2025-04-04 Thread via GitHub
jainankitk commented on code in PR #14435: URL: https://github.com/apache/lucene/pull/14435#discussion_r2029301580 ## lucene/core/src/java/org/apache/lucene/document/KeywordField.java: ## @@ -175,9 +174,8 @@ public static Query newExactQuery(String field, String value) { pub

Re: [I] Case insensitive regex query with character range [lucene]

2025-04-04 Thread via GitHub
rmuir commented on issue #14378: URL: https://github.com/apache/lucene/issues/14378#issuecomment-2741911493 we are better armed to begin looking at this one after recent changes, see https://github.com/apache/lucene/pull/14193#issuecomment-2638840849 now that there is an actual single

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-04 Thread via GitHub
benwtrent commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2027504697 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -130,4 +134,56 @@ public KnnVectorsReader getMergeInstance() { * The default implem

Re: [PR] Add CaseFolding.fold(), inverse of expand(), move to UnicodeUtil, add filter [lucene]

2025-04-04 Thread via GitHub
github-actions[bot] commented on PR #14389: URL: https://github.com/apache/lucene/pull/14389#issuecomment-2779945827 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-04-04 Thread via GitHub
github-actions[bot] commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2779945929 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Add more information to IOContext [lucene]

2025-04-04 Thread via GitHub
jainankitk commented on issue #14422: URL: https://github.com/apache/lucene/issues/14422#issuecomment-2767494904 Looks related to the discussion here - https://github.com/apache/lucene/issues/14348#issuecomment-2730902349. More general, than specifically vector files -- This is an automa

[PR] Remove slices creation overhead from IndexSearcher constructor [lucene]

2025-04-04 Thread via GitHub
javanna opened a new pull request, #14428: URL: https://github.com/apache/lucene/pull/14428 We simplified the slices creation in IndexSearcher with #13893. That removed the need for a caching supplier, but had an unexpected effect: we eagerly create slices now for the case where no executor