[I] Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges [lucene]

2024-10-16 Thread via GitHub
navneet1v opened a new issue, #13920: URL: https://github.com/apache/lucene/issues/13920 ## Description For past 1 month we have been testing difference in performance for a files getting opened with IOContext.RANDOM vs IOContext.READ specially during merges with Lucene version 9.11.1 an

Re: [PR] Lazy initialize ForDeltaUtil and ForUtil in Lucene912PostingsReader [lucene]

2024-10-16 Thread via GitHub
jpountz commented on PR #13885: URL: https://github.com/apache/lucene/pull/13885#issuecomment-2416031098 It's almost certainly this change that sped up TermTitleSort on October 12th, since TermTitleSort is the task that creates the most `PostingsEnum` objects. I pushed an annotation. --

[I] How to use hnsw int4 when loading index [lucene]

2024-10-16 Thread via GitHub
hanqiushi opened a new issue, #13921: URL: https://github.com/apache/lucene/issues/13921 ### Description Hi, I have a question when using hnsw int4. I have built an index using Lucene99HnswScalarQuantizedVectorsFormat bits=4 param. But I don't know how to use it in search. I hav

Re: [PR] Have value and count in LabelAndValue only for TaxonomyFacets [lucene]

2024-10-16 Thread via GitHub
stefanvodita commented on PR #13740: URL: https://github.com/apache/lucene/pull/13740#issuecomment-2416197652 I like progress-not-perfection, but looking at this again I'm not sure it's progress. To me, it seems like a lot of complexity for a little bit of efficiency, but I could be wrong.

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-16 Thread via GitHub
jpountz commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1802496621 ## lucene/MIGRATE.md: ## @@ -892,3 +892,7 @@ segments are rewritten either via `IndexWriter.forceMerge` or ### Vector values APIs switched to primarily random-access

Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-10-16 Thread via GitHub
benwtrent commented on PR #13635: URL: https://github.com/apache/lucene/pull/13635#issuecomment-2417118146 Hey @seanmacavaney didn't want this to die on the vine. I think with some refactoring and adding new experimental queries, this could be a nice experimental feature for vector search.

Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-10-16 Thread via GitHub
seanmacavaney commented on PR #13635: URL: https://github.com/apache/lucene/pull/13635#issuecomment-2417194138 Not at all-- thanks a lot for the help @benwtrent! I totally agree with the proposed changes and it's clear how to move forward on this. I'm just occupied with several other priori

Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-10-16 Thread via GitHub
bugmakerr commented on PR #13763: URL: https://github.com/apache/lucene/pull/13763#issuecomment-2416803313 @jpountz I have merged main branch, PTAL:) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-10-16 Thread via GitHub
bugmakerr commented on code in PR #13763: URL: https://github.com/apache/lucene/pull/13763#discussion_r1803076020 ## lucene/core/src/java/org/apache/lucene/codecs/perfield/PerFieldKnnVectorsFormat.java: ## @@ -239,51 +245,69 @@ public FieldsReader(final SegmentReadState read

Re: [PR] Better handle dynamic pruning when the leading clause has a single impact block. [lucene]

2024-10-16 Thread via GitHub
jpountz commented on PR #13904: URL: https://github.com/apache/lucene/pull/13904#issuecomment-2416808261 This caused a big regression in `AndHighOrMedMed`. I'm reverting. https://benchmarks.mikemccandless.com/AndHighOrMedMed.html -- This is an automated message from the Apache Git Service

Re: [PR] Introduce multiSelect for ScalarQuantizer [lucene]

2024-10-16 Thread via GitHub
benwtrent commented on code in PR #13919: URL: https://github.com/apache/lucene/pull/13919#discussion_r1803334241 ## lucene/core/src/java/org/apache/lucene/util/quantization/ScalarQuantizer.java: ## @@ -568,29 +568,34 @@ private static List findNearestNeighbors( * and `95`.

[I] Can we use Panama Vector API for quantizing vectors? [lucene]

2024-10-16 Thread via GitHub
benwtrent opened a new issue, #13922: URL: https://github.com/apache/lucene/issues/13922 ### Description It would take a bit of refactoring, but: ``` float dx = v - minQuantile; float dxc = Math.max(minQuantile, Math.min(maxQuantile, v)) - minQuantile; floa

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-16 Thread via GitHub
benwtrent commented on code in PR #13910: URL: https://github.com/apache/lucene/pull/13910#discussion_r1803130185 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java: ## @@ -1906,4 +1916,122 @@ public void testMismatchedFields() thro

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-16 Thread via GitHub
benwtrent commented on code in PR #13910: URL: https://github.com/apache/lucene/pull/13910#discussion_r1803132931 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java: ## @@ -1906,4 +1916,122 @@ public void testMismatchedFields() thro

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-16 Thread via GitHub
benwtrent commented on code in PR #13910: URL: https://github.com/apache/lucene/pull/13910#discussion_r1803081052 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java: ## @@ -1906,4 +1916,122 @@ public void testMismatchedFields() thro

Re: [PR] Fixed bit set vector [lucene]

2024-10-16 Thread via GitHub
risdenk commented on PR #13827: URL: https://github.com/apache/lucene/pull/13827#issuecomment-2417258152 Thanks appreciate the second set of eyes. Glad to know I'm not going crazy. I also didn't see a good way forward for updating the array. -- This is an automated message from the Apache

Re: [PR] Fixed bit set vector [lucene]

2024-10-16 Thread via GitHub
risdenk closed pull request #13827: Fixed bit set vector URL: https://github.com/apache/lucene/pull/13827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-10-16 Thread via GitHub
benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2417284114 I am currently working on moving this to Lucene101 format with the bug fixes we discovered in additional testing. -- This is an automated message from the Apache Git Service. To res

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-16 Thread via GitHub
msokolov commented on code in PR #13910: URL: https://github.com/apache/lucene/pull/13910#discussion_r1803114051 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java: ## @@ -1906,4 +1916,122 @@ public void testMismatchedFields() throw

Re: [I] How to use hnsw int4 when loading index [lucene]

2024-10-16 Thread via GitHub
benwtrent commented on issue #13921: URL: https://github.com/apache/lucene/issues/13921#issuecomment-2416554406 @hanqiushi you don't need to do anything special. Since its a format, you only have to query the index. So, once you have indexed a bunch of [KnnFloatVectorField](https://

Re: [I] How to use hnsw int4 when loading index [lucene]

2024-10-16 Thread via GitHub
benwtrent closed issue #13921: How to use hnsw int4 when loading index URL: https://github.com/apache/lucene/issues/13921 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Fixed bit set vector [lucene]

2024-10-16 Thread via GitHub
benwtrent commented on PR #13827: URL: https://github.com/apache/lucene/pull/13827#issuecomment-2416558059 Yeah, I poked around and it just seems like unloading the CPU vectors back into an array eats all performance gains. I tried some reorganization to see if there was a better way,

Re: [PR] Copy stored fields during flush with index sort [lucene]

2024-10-16 Thread via GitHub
dnhatn closed pull request #13803: Copy stored fields during flush with index sort URL: https://github.com/apache/lucene/pull/13803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Copy stored fields during flush with index sort [lucene]

2024-10-16 Thread via GitHub
dnhatn commented on PR #13803: URL: https://github.com/apache/lucene/pull/13803#issuecomment-2418198002 I ran the benchmarks multiple times, but the improvement seems to be more noise than a real gain, so I'm closing this PR. -- This is an automated message from the Apache Git Service. To

Re: [I] `IndexOrDocValuesQuery` does not support query highlighting [lucene]

2024-10-16 Thread via GitHub
harshavamsi commented on issue #12686: URL: https://github.com/apache/lucene/issues/12686#issuecomment-2417446490 Thanks @prudhvigodithi! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] `IndexOrDocValuesQuery` does not support query highlighting [lucene]

2024-10-16 Thread via GitHub
harshavamsi closed issue #12686: `IndexOrDocValuesQuery` does not support query highlighting URL: https://github.com/apache/lucene/issues/12686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-10-16 Thread via GitHub
mdmarshmallow commented on issue #13913: URL: https://github.com/apache/lucene/issues/13913#issuecomment-2418316155 After some discussion with @mikemccand, I'm currently planning on making a separate RAM manager that will force that largest `IndexWriters` to flush their buffers if the total