Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-14 Thread via GitHub
stefanvodita commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1956984177 ## lucene/CHANGES.txt: ## @@ -20,6 +20,7 @@ New Features Improvements - +* GITHUB#14238: Improve test coverage of Dynamic Range Faceting.

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-14 Thread via GitHub
houserjohn commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1956936618 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +77,248 @@ public void testComputeDynamicNumericRangesWithOneLargeWe

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-14 Thread via GitHub
houserjohn commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1956924426 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +77,248 @@ public void testComputeDynamicNumericRangesWithOneLargeWe

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-14 Thread via GitHub
houserjohn commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1956920492 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +77,248 @@ public void testComputeDynamicNumericRangesWithOneLargeWe

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-14 Thread via GitHub
houserjohn commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1956920117 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +77,248 @@ public void testComputeDynamicNumericRangesWithOneLargeWe

Re: [PR] Use multi-select instead of a full sort for DynamicRange creation [lucene]

2025-02-14 Thread via GitHub
houserjohn commented on code in PR #13914: URL: https://github.com/apache/lucene/pull/13914#discussion_r1956877159 ## lucene/facet/src/java/org/apache/lucene/facet/range/DynamicRangeUtil.java: ## @@ -202,66 +208,83 @@ public SegmentOutput(int hitsLength) { * is used to c

Re: [I] TestLogMergePolicy#testNoPathologicalMerge reproducible failure [lucene]

2025-02-14 Thread via GitHub
benwtrent closed issue #14206: TestLogMergePolicy#testNoPathologicalMerge reproducible failure URL: https://github.com/apache/lucene/issues/14206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] TestLogMergePolicy#testNoPathologicalMerge reproducible failure [lucene]

2025-02-14 Thread via GitHub
benwtrent closed issue #14206: TestLogMergePolicy#testNoPathologicalMerge reproducible failure URL: https://github.com/apache/lucene/issues/14206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Adjust assert merge flakiness due to fencepost error [lucene]

2025-02-14 Thread via GitHub
benwtrent merged PR #14245: URL: https://github.com/apache/lucene/pull/14245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-14 Thread via GitHub
msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2660180526 I don't believe 16 is "special" except in the sense that it happens to be a sweet spot is this context. We expect that as we increase that per-segment factor we will get increased reca

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-14 Thread via GitHub
benwtrent commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2660166393 > The latency improvement from re-using scores is surprisingly large. I would have expected costs to be dominated by newly-explored nodes, but this is cool. @msokolov this was o

Re: [PR] Support DataInput as source for StoredField [lucene]

2025-02-14 Thread via GitHub
Tim-Brooks commented on PR #14213: URL: https://github.com/apache/lucene/pull/14213#issuecomment-2660154259 > @Tim-Brooks Could you add an entry in CHANGES.txt? It should be under the 10.2 version, thanks! I made this change and added some more tests. Let me know if any additional te

Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-02-14 Thread via GitHub
msokolov commented on PR #14226: URL: https://github.com/apache/lucene/pull/14226#issuecomment-2660150066 The latency improvement from re-using scores is surprisingly large. I would have expected costs to be dominated by newly-explored nodes, but this is cool. -- This is an automated mess

[PR] Relation check within 1D BKD Leaves [lucene]

2025-02-14 Thread via GitHub
gf2121 opened a new pull request, #14244: URL: https://github.com/apache/lucene/pull/14244 This patch tries to introduce additional relation check to shortcut visiting within BKD leaves. This helps queries that hit few docs like PointInSetQuery or narrow RangeQueries on high-cardinality fie

Re: [PR] Utility classes to make it easier to use sandbox facet API for most common cases [lucene]

2025-02-14 Thread via GitHub
stefanvodita commented on code in PR #14237: URL: https://github.com/apache/lucene/pull/14237#discussion_r1956459385 ## lucene/demo/src/java/org/apache/lucene/demo/facet/SandboxFacetsExample.java: ## @@ -130,6 +135,88 @@ void index() throws IOException { IOUtils.close(index

Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]

2025-02-14 Thread via GitHub
jpountz commented on PR #13470: URL: https://github.com/apache/lucene/pull/13470#issuecomment-2659645405 @harenlin I took some freedom to apply my feedback and push it to your branch. Would you like to take a look and check if it makes sense? -- This is an automated message from the Apach

Re: [PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

2025-02-14 Thread via GitHub
benwtrent commented on PR #14160: URL: https://github.com/apache/lucene/pull/14160#issuecomment-2659586574 Annot & other potpourri I fixed during the prefiltered benchmarking for this change. https://github.com/mikemccand/luceneutil/pull/337 -- This is an automated message from the Apache

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659556195 [perf_asm.log](https://github.com/user-attachments/files/18801016/perf_asm.log) Profile suggests that loops get vectorized. -- This is an automated message from the Apache Git Se

Re: [PR] Add new Acorn-esque filtered HNSW search heuristic [lucene]

2025-02-14 Thread via GitHub
benwtrent commented on PR #14160: URL: https://github.com/apache/lucene/pull/14160#issuecomment-2659548003 Nightly has picked up the change: https://benchmarks.mikemccandless.com/PreFilteredVectorSearch.html https://github.com/user-attachments/assets/535bbaf7-a38a-4c8e-9cf7-3e7c

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659531719 Results on my machines are a bit disappointing ``` java -version openjdk version "23.0.2" 2025-01-21 OpenJDK Runtime Environment (build 23.0.2+7-58) OpenJDK 64-Bit Server VM

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659510128 Yes, exactly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
gf2121 commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659505128 Thanks for feedback! And sorry for my poor english.. Do you mean something like this by `single batch size of 16 of 32` ? ``` private static void readDelta16(IndexInput in, i

Re: [PR] supports force merge based on specified segments. [lucene]

2025-02-14 Thread via GitHub
jpountz commented on PR #14163: URL: https://github.com/apache/lucene/pull/14163#issuecomment-2659489394 This PR will not get merged, but we are interested in fixing the underlying bug that causes Lucene to not run a merge when it (at least apparently) obviously should. -- This is an aut

Re: [PR] Reduce virtual calls when visiting bpv24-encoded doc ids in BKD leaves [lucene]

2025-02-14 Thread via GitHub
jpountz commented on PR #14176: URL: https://github.com/apache/lucene/pull/14176#issuecomment-2659465229 I pushed an annotation. https://github.com/mikemccand/luceneutil/commit/e07e590ca21b7aacc91f0de9a296f8ffd2042bc3 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Use Vector API to decode BKD docIds [lucene]

2025-02-14 Thread via GitHub
jpountz commented on PR #14203: URL: https://github.com/apache/lucene/pull/14203#issuecomment-2659455382 Thanks for iterating and running benchmarks. I played with the micro-benchmark and I get almost the same result if I use a single batch size of 16 of 32 (AMD Ryzen with AVX2 but no AVX-5

Re: [I] Hnsw format testRecall failing [lucene]

2025-02-14 Thread via GitHub
benwtrent closed issue #14233: Hnsw format testRecall failing URL: https://github.com/apache/lucene/issues/14233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [I] Hnsw format testRecall failing [lucene]

2025-02-14 Thread via GitHub
benwtrent closed issue #14233: Hnsw format testRecall failing URL: https://github.com/apache/lucene/issues/14233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Remove duplicates from the hnsw recall testing [lucene]

2025-02-14 Thread via GitHub
benwtrent merged PR #14234: URL: https://github.com/apache/lucene/pull/14234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Simplify operations by avoiding calculation of extra bitwise operations [lucene]

2025-02-14 Thread via GitHub
giorgigagnidze16 closed pull request #14242: Simplify operations by avoiding calculation of extra bitwise operations URL: https://github.com/apache/lucene/pull/14242 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Simplify bitwise operations in DefaultVectorUtilSupport [lucene]

2025-02-14 Thread via GitHub
giorgigagnidze16 closed issue #14240: Simplify bitwise operations in DefaultVectorUtilSupport URL: https://github.com/apache/lucene/issues/14240 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Simplify operations by avoiding calculation of extra bitwise operations [lucene]

2025-02-14 Thread via GitHub
rmuir commented on PR #14242: URL: https://github.com/apache/lucene/pull/14242#issuecomment-2659247757 > Double checking that it works for you too @rmuir? As long as we fix javadocs of VectorSpecies.loopBound to match :) Vector operations loops are not the place to optimize or

Re: [PR] Simplify operations by avoiding calculation of extra bitwise operations [lucene]

2025-02-14 Thread via GitHub
rmuir commented on PR #14242: URL: https://github.com/apache/lucene/pull/14242#issuecomment-2659233974 This was written this way (with a bit of redundancy) on purpose to make it 100% clear it is consistent with `VectorSpecies.loopBound`: > As long as VLENGTH is a power of two, then t

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-02-14 Thread via GitHub
ChrisHegarty commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2659128547 I just committed a rewrite for the cuVS format implementation. After the rewrite all the BaseKnnVectorsFormatTestCase tests pass. There are still some lurking intermittent fai

Re: [PR] Simplify operations by avoiding calculation of extra bitwise operations [lucene]

2025-02-14 Thread via GitHub
jpountz commented on PR #14242: URL: https://github.com/apache/lucene/pull/14242#issuecomment-2659075266 Thank you. There shouldn't be any performance difference since I would expect the compiler to pre-compute the result of `~(4-1)` since these are constants, so this change is only about r

[PR] Simplify operations by avoiding calculation of extra bitwise operations [lucene]

2025-02-14 Thread via GitHub
giorgigagnidze16 opened a new pull request, #14242: URL: https://github.com/apache/lucene/pull/14242 ### Description A minor change to eliminate unnecessary bitwise calculations, replaced with actual expected value Example: `int upperBound = a.length & ~(4 - 1); ` -> `int upper

Re: [I] Simplify bitwise operations in DefaultVectorUtilSupport [lucene]

2025-02-14 Thread via GitHub
giorgigagnidze16 commented on issue #14240: URL: https://github.com/apache/lucene/issues/14240#issuecomment-2659053672 Figured it out, thanks. Didn't realize people were forking the repos first, #14242 Here's the PR -- This is an automated message from the Apache Git Service. To respond t

Re: [I] Simplify bitwise operations in DefaultVectorUtilSupport [lucene]

2025-02-14 Thread via GitHub
jpountz commented on issue #14240: URL: https://github.com/apache/lucene/issues/14240#issuecomment-2658992237 That shouldn't be required. Have you checked out https://github.com/apache/lucene/blob/main/CONTRIBUTING.md? Feel free to let me know if there is a step that isn't clear to you. -

[I] Make multi-field search a first-class citizen [lucene]

2025-02-14 Thread via GitHub
jpountz opened a new issue, #14241: URL: https://github.com/apache/lucene/issues/14241 ### Description Searching across multiple fields is a very frequent need, e.g. both Solr and Elasticsearch allow searching a query string on multiple fields, yet Lucene doesn't make it easy.

Re: [I] Simplify bitwise operations in DefaultVectorUtilSupport [lucene]

2025-02-14 Thread via GitHub
giorgigagnidze16 commented on issue #14240: URL: https://github.com/apache/lucene/issues/14240#issuecomment-2658975225 @jpountz Sure, but I couldn't figure out how to :) Need to be added to contributors list somehow? -- This is an automated message from the Apache Git Service. To respond

Re: [I] Simplify bitwise operations in DefaultVectorUtilSupport [lucene]

2025-02-14 Thread via GitHub
jpountz commented on issue #14240: URL: https://github.com/apache/lucene/issues/14240#issuecomment-2658971004 Would you like to open a pull request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [Unit] Increase Dynamic Range Faceting coverage by adding previously nonexistent unit tests [lucene]

2025-02-14 Thread via GitHub
stefanvodita commented on code in PR #14238: URL: https://github.com/apache/lucene/pull/14238#discussion_r1955848183 ## lucene/facet/src/test/org/apache/lucene/facet/range/TestDynamicRangeUtil.java: ## @@ -76,13 +77,248 @@ public void testComputeDynamicNumericRangesWithOneLarge

[I] Simplify bitwise operations in DefaultVectorUtilSupport [lucene]

2025-02-14 Thread via GitHub
giorgigagnidze16 opened a new issue, #14240: URL: https://github.com/apache/lucene/issues/14240 ### Description Currently, `DefaultVectorUtilSupport` contains methods that use unnecessary bitwise operations. These can be replaced with direct values to eliminate extra computations. Fo