Re: [I] Add more information to IOContext [lucene]

2025-04-01 Thread via GitHub
jpountz commented on issue #14422: URL: https://github.com/apache/lucene/issues/14422#issuecomment-2769197888 Thank you, I had started thinking along those lines but got blocked because I hadn't thought about using multiple "dimensions" for the context, ie. metadata/index/data is one dimens

Re: [PR] PointInSetQuery early exit on non-matching segments [lucene]

2025-04-01 Thread via GitHub
gsmiller merged PR #14268: URL: https://github.com/apache/lucene/pull/14268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] PointInSetQuery early exit on non-matching segments [lucene]

2025-04-01 Thread via GitHub
gsmiller commented on PR #14268: URL: https://github.com/apache/lucene/pull/14268#issuecomment-2769642302 This looks great! Taking care of the merge now. Thank you @hanbj ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2022995174 ## lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsReader.java: ## @@ -257,6 +257,15 @@ public long ramBytesUsed() { r

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2022995174 ## lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsReader.java: ## @@ -257,6 +257,15 @@ public long ramBytesUsed() { r

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2023074747 ## lucene/core/src/java/org/apache/lucene/util/OffHeapAccountable.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Reduce the number of comparisons when lowerPoint is equal to upperPoint [lucene]

2025-04-01 Thread via GitHub
hanbj commented on code in PR #14267: URL: https://github.com/apache/lucene/pull/14267#discussion_r2022373498 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -129,6 +141,16 @@ public final Weight createWeight(IndexSearcher searcher, ScoreMode scoreM

Re: [I] Reuse packedTerms between two TermInSetQuery what combined with IndexOrDocValuesQuery [lucene]

2025-04-01 Thread via GitHub
jpountz commented on issue #14425: URL: https://github.com/apache/lucene/issues/14425#issuecomment-2769408354 The builder approach should work. Or maybe a static helper like `public static Query newIndexOrDocValuesSetQuery(RewriteMethod indexRewriteMethod, String field, Collection terms)` t

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
benwtrent commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2022890436 ## lucene/core/src/java/org/apache/lucene/util/OffHeapAccountable.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor

Re: [PR] Adding profiling support for concurrent segment search [lucene]

2025-04-01 Thread via GitHub
jpountz commented on PR #14413: URL: https://github.com/apache/lucene/pull/14413#issuecomment-2769456423 I'd have a top-level tree for everything related to initializing the search and combining results (rewrite(), createWeight(), CollectorManager#reduce) and then a list of trees for each s

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2022920202 ## lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsReader.java: ## @@ -257,6 +257,15 @@ public long ramBytesUsed() { r

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2022923407 ## lucene/core/src/java/org/apache/lucene/util/OffHeapAccountable.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [I] Reuse packedTerms between two TermInSetQuery what combined with IndexOrDocValuesQuery [lucene]

2025-04-01 Thread via GitHub
jpountz commented on issue #14425: URL: https://github.com/apache/lucene/issues/14425#issuecomment-2769216017 Indeed it would be nice if `KeywordField#newSetQuery` didn't pay the CPU and heap price for creating the `PrefixCodecTerms` instance twice. At the same time, let's keep `PrefixCoded

Re: [I] Reuse packedTerms between two TermInSetQuery what combined with IndexOrDocValuesQuery [lucene]

2025-04-01 Thread via GitHub
mkhludnev commented on issue #14425: URL: https://github.com/apache/lucene/issues/14425#issuecomment-2769246604 Presumably, one TermInSetQuery may create another with the rewrite method specified. WDYT? Or TermInSetQueryBuider may create query by query with different rewrites? -- This

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-04-01 Thread via GitHub
tteofili commented on code in PR #14094: URL: https://github.com/apache/lucene/pull/14094#discussion_r2022551917 ## lucene/core/src/java/org/apache/lucene/search/HnswKnnCollector.java: ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-04-01 Thread via GitHub
tteofili commented on code in PR #14094: URL: https://github.com/apache/lucene/pull/14094#discussion_r2022636246 ## lucene/core/src/java/org/apache/lucene/search/HnswKnnCollector.java: ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-04-01 Thread via GitHub
tteofili commented on code in PR #14094: URL: https://github.com/apache/lucene/pull/14094#discussion_r2022551917 ## lucene/core/src/java/org/apache/lucene/search/HnswKnnCollector.java: ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
ChrisHegarty commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2022693959 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java: ## @@ -306,6 +306,16 @@ private HnswGraph getGraphVa

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
tteofili commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2022684543 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene90/Lucene90HnswVectorsReader.java: ## @@ -306,6 +306,16 @@ private HnswGraph getGraphValues

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
benwtrent commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2023094888 ## lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsReader.java: ## @@ -257,6 +257,15 @@ public long ramBytesUsed() { retu

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-01 Thread via GitHub
gf2121 commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2770201445 > We should add this format to RandomCodec then, so that it gets included as part of codec randomization. OK, did not see this. I know how to do it then. Thanks Adrien :) -- This

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-01 Thread via GitHub
jpountz commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2770123261 > Once we think this is ready, we should prolly merge at first as the non-default Codec We should add this format to `RandomCodec` then, so that it gets included as part of codec

Re: [PR] Reduce the number of comparisons when lowerPoint is equal to upperPoint [lucene]

2025-04-01 Thread via GitHub
jainankitk commented on code in PR #14267: URL: https://github.com/apache/lucene/pull/14267#discussion_r2023189174 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -120,381 +132,447 @@ public void visit(QueryVisitor visitor) { public final Weight c

Re: [I] Incorrect use of fsync [lucene]

2025-04-01 Thread via GitHub
viliam-durina commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2768802025 I think we must fsync also the temporary files. Without fsyncing, when we read them back, they might be incomplete and no error might be reported. We could perhaps avoid fsync

Re: [PR] Add support for determining off-heap memory requirements for KnnVectorsReader [lucene]

2025-04-01 Thread via GitHub
benwtrent commented on code in PR #14426: URL: https://github.com/apache/lucene/pull/14426#discussion_r2022982193 ## lucene/core/src/java/org/apache/lucene/codecs/lucene102/Lucene102BinaryQuantizedVectorsReader.java: ## @@ -257,6 +257,15 @@ public long ramBytesUsed() { retu

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-01 Thread via GitHub
gf2121 commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2770173066 Thank you very much for all these careful, warm and helpful comments! > Are there any major items / blockers? I think I've addressed all of them (hopefully didn't miss any).

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-04-01 Thread via GitHub
tteofili commented on PR #14094: URL: https://github.com/apache/lucene/pull/14094#issuecomment-2769837089 @benwtrent I've reworked the design exposing `KnnSearchStrategy#nextVectorsBlock` and `PatienceKnnVectorQuery` leverages a `Patience` strategy that calls the `HnswQueueSaturationCollect

Re: [PR] Reduce the number of comparisons when lowerPoint is equal to upperPoint [lucene]

2025-04-01 Thread via GitHub
hanbj commented on code in PR #14267: URL: https://github.com/apache/lucene/pull/14267#discussion_r2024183379 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -120,381 +132,447 @@ public void visit(QueryVisitor visitor) { public final Weight create

Re: [I] Incorrect use of fsync [lucene]

2025-04-01 Thread via GitHub
dweiss commented on issue #14334: URL: https://github.com/apache/lucene/issues/14334#issuecomment-2771397718 But why would you want to read a temporary file after a crash? These are... temporary - if a process crashed, there is no recovery at all (at least concerning temporary files). --

Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

2025-04-01 Thread via GitHub
jainankitk commented on code in PR #14397: URL: https://github.com/apache/lucene/pull/14397#discussion_r2021750116 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/LZ4WithPresetDictCompressionMode.java: ## @@ -98,12 +98,17 @@ public void decompress(DataInput in, int ori

[I] fix TestIndexWriterWithThreads#testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce [lucene]

2025-04-01 Thread via GitHub
guojialiang92 opened a new issue, #14423: URL: https://github.com/apache/lucene/issues/14423 ### Description ### Description I found that Test `TestIndexWriterWithThreads#testIOExceptionDuringWriteSegmentWithThreadsOnlyOnce` may fail in rare cases. Exception information is as foll

Re: [PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]

2025-04-01 Thread via GitHub
vsop-479 commented on PR #14371: URL: https://github.com/apache/lucene/pull/14371#issuecomment-2768597044 Adjust `ENABLE_ADVANCE_WITHIN_BLOCK_VECTOR_OPTO` to 16 (at least 16 lanes, such as: AVX, AVX-512). -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-04-01 Thread via GitHub
mikemccand commented on code in PR #14333: URL: https://github.com/apache/lucene/pull/14333#discussion_r2022734745 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/TrieBuilder.java: ## @@ -0,0 +1,552 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Reduce the number of comparisons when lowerPoint is equal to upperPoint [lucene]

2025-04-01 Thread via GitHub
hanbj commented on code in PR #14267: URL: https://github.com/apache/lucene/pull/14267#discussion_r2022373498 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -129,6 +141,16 @@ public final Weight createWeight(IndexSearcher searcher, ScoreMode scoreM

[I] Reuse packedTerms between two TermInSetQuery what combined with IndexOrDocValuesQuery [lucene]

2025-04-01 Thread via GitHub
mkhludnev opened a new issue, #14425: URL: https://github.com/apache/lucene/issues/14425 ### Description In cases like these ``` new IndexOrDocValuesQuery( new TermInSetQuery(MultiTermQuery.CONSTANT_SCORE_BLENDED_REWRITE, name(), iBytesRefs), new Te

Re: [PR] SortedSet DV Multi Range query [lucene]

2025-04-01 Thread via GitHub
mkhludnev commented on code in PR #13974: URL: https://github.com/apache/lucene/pull/13974#discussion_r2023179556 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/SortedSetMultiRangeQuery.java: ## @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (A