Re: [PR] Reduce the overhead of `IndexInput#prefetch` when data is cached in RAM. [lucene]

2024-05-17 Thread via GitHub
jpountz commented on PR #13381: URL: https://github.com/apache/lucene/pull/13381#issuecomment-2118319218 I slightly modified the benchmark from #13337 ```java import java.io.IOException; import java.nio.file.Path; import java.nio.file.Paths; import java.util.ArrayList;

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-17 Thread via GitHub
naveentatikonda commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2118316039 @benwtrent I tried to setup luceneutil but I was running into a ton of compilation errors when building it against latest lucene src code. I have a doubt in the existing qua

[PR] Reduce the overhead of `IndexInput#prefetch` when data is cached in RAM. [lucene]

2024-05-17 Thread via GitHub
jpountz opened a new pull request, #13381: URL: https://github.com/apache/lucene/pull/13381 As Robert pointed out and benchmarks confirmed, there is some (small) overhead to calling `madvise` via the foreign function API, benchmarks suggest it is in the order of 1-2us. This is not much for

Re: [I] Multi range traversal for numeric range aggregations [lucene]

2024-05-17 Thread via GitHub
mikemccand commented on issue #13335: URL: https://github.com/apache/lucene/issues/13335#issuecomment-2118193225 I like this optimization. Maybe it best fits in Lucene's facet module? But, I don't think our facet impls today ever use points, directly, to do counting/aggregation -- it's a

Re: [I] [DISCUSS] Identifying Gaps in Lucene’s Faceting [lucene]

2024-05-17 Thread via GitHub
mikemccand commented on issue #12553: URL: https://github.com/apache/lucene/issues/12553#issuecomment-2118185393 https://github.com/apache/lucene/issues/13335 is an interesting example where Lucene could more efficiently implement faceting for numeric ranges using points, directly, instead

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
easyice commented on PR #13376: URL: https://github.com/apache/lucene/pull/13376#issuecomment-2118037371 Backport completed and added an entry under 9.10.1 Bug Fixes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Disjunction as CompetitiveIterator for numeric dynamic pruning [lucene]

2024-05-17 Thread via GitHub
mikemccand commented on PR #13221: URL: https://github.com/apache/lucene/pull/13221#issuecomment-2117960598 > FYI we'll need to add `// nightly-benchmarks-results-changed //` to the commit message to set expectations with nightly benchmarks that sorting tasks return a different hit count wi

Re: [PR] Fix TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-17 Thread via GitHub
ChrisHegarty merged PR #13361: URL: https://github.com/apache/lucene/pull/13361 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [I] Test failures in TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-17 Thread via GitHub
ChrisHegarty closed issue #13210: Test failures in TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults URL: https://github.com/apache/lucene/issues/13210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
easyice commented on PR #13376: URL: https://github.com/apache/lucene/pull/13376#issuecomment-2117910603 Okay, I will backport to 9.10/branch_9x. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
jpountz commented on PR #13376: URL: https://github.com/apache/lucene/pull/13376#issuecomment-2117901339 +1 to a bugfix release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
jpountz commented on PR #13376: URL: https://github.com/apache/lucene/pull/13376#issuecomment-2117901620 Can you backport to the 9.10 branch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
easyice merged PR #13376: URL: https://github.com/apache/lucene/pull/13376 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] DataOutput.writeGroupVInts throws IntegerOverflow exception during merging [lucene]

2024-05-17 Thread via GitHub
easyice closed issue #13373: DataOutput.writeGroupVInts throws IntegerOverflow exception during merging URL: https://github.com/apache/lucene/issues/13373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Speedup concurrent multi-segment HNSW graph search 2 [lucene]

2024-05-17 Thread via GitHub
mikemccand commented on PR #12962: URL: https://github.com/apache/lucene/pull/12962#issuecomment-2117866710 I think this was released in 9.10.0? I added a milestone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
easyice commented on PR #13376: URL: https://github.com/apache/lucene/pull/13376#issuecomment-2117847222 > Does the exception happen because the remainder part of a postings list (after all length 128 blocks are done), which we now encode with GroupVInt, had a docID delta that was >= 1<<30,

Re: [I] DataOutput.writeGroupVInts throws IntegerOverflow exception during merging [lucene]

2024-05-17 Thread via GitHub
mikemccand commented on issue #13373: URL: https://github.com/apache/lucene/issues/13373#issuecomment-2117813158 Here is the [`java-user` discussion that lead to this issue](https://lists.apache.org/thread/3vdfm8f3td5tgwd0w3tkvr2sq45y5n78). Thank you for reporting this @iamsanjay! It

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
mikemccand commented on PR #13376: URL: https://github.com/apache/lucene/pull/13376#issuecomment-2117812216 It's hard for me to tell what the expected user impact here is? Does the exception happen because the remainder part of a postings list (after all length 128 blocks are done), which

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
easyice commented on code in PR #13376: URL: https://github.com/apache/lucene/pull/13376#discussion_r1605011950 ## lucene/core/src/java/org/apache/lucene/store/DataOutput.java: ## @@ -328,9 +328,12 @@ public void writeSetOfStrings(Set set) throws IOException { /** * Enco

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
easyice commented on code in PR #13376: URL: https://github.com/apache/lucene/pull/13376#discussion_r1604988872 ## lucene/core/src/java/org/apache/lucene/util/GroupVIntUtil.java: ## @@ -118,6 +120,13 @@ private static int numBytes(int v) { return Integer.BYTES - (Integer.nu

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
easyice commented on code in PR #13376: URL: https://github.com/apache/lucene/pull/13376#discussion_r1604980640 ## lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java: ## @@ -1442,6 +1442,19 @@ public void testListAllIsSorted() throws IOExcept

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
jpountz commented on code in PR #13376: URL: https://github.com/apache/lucene/pull/13376#discussion_r1604452451 ## lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java: ## @@ -1442,6 +1442,19 @@ public void testListAllIsSorted() throws IOExcept

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
easyice commented on PR #13376: URL: https://github.com/apache/lucene/pull/13376#issuecomment-2117348590 I pushed the requested changes, @jpountz . No rush, just wanted to let you know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-17 Thread via GitHub
rmuir commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1604776531 ## lucene/core/src/java/org/apache/lucene/index/TermsEnum.java: ## @@ -61,6 +62,15 @@ public enum SeekStatus { */ public abstract boolean seekExact(BytesRef text)

Re: [PR] Add a MemorySegment Vector scorer - for scoring without copying on-heap [lucene]

2024-05-17 Thread via GitHub
ChrisHegarty commented on code in PR #13339: URL: https://github.com/apache/lucene/pull/13339#discussion_r1604777876 ## lucene/core/src/test/org/apache/lucene/internal/vectorization/TestVectorScorer.java: ## @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] lucene-monitor: make TermFilteredPresearcher.ANYTOKEN[_FIELD] public [lucene]

2024-05-17 Thread via GitHub
cpoerschke merged PR #13379: URL: https://github.com/apache/lucene/pull/13379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Reproducible failure TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-17 Thread via GitHub
timgrein commented on issue #13380: URL: https://github.com/apache/lucene/issues/13380#issuecomment-2117208263 Should also be fixed by https://github.com/apache/lucene/pull/13361 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Disjunction as CompetitiveIterator for numeric dynamic pruning [lucene]

2024-05-17 Thread via GitHub
gf2121 commented on code in PR #13221: URL: https://github.com/apache/lucene/pull/13221#discussion_r1604564002 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -405,5 +395,278 @@ public int advance(int target) throws IOException { p

Re: [I] Reproducible failure TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-17 Thread via GitHub
ChrisHegarty commented on issue #13380: URL: https://github.com/apache/lucene/issues/13380#issuecomment-2116989673 Fails the same on `main`, `branch_9.x`, and also `branch_9_10`. So not a new issue, per se. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Disjunction as CompetitiveIterator for numeric dynamic pruning [lucene]

2024-05-17 Thread via GitHub
jpountz commented on code in PR #13221: URL: https://github.com/apache/lucene/pull/13221#discussion_r1604525285 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -405,5 +395,278 @@ public int advance(int target) throws IOException {

[I] Reproducible failure TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults [lucene]

2024-05-17 Thread via GitHub
ChrisHegarty opened a new issue, #13380: URL: https://github.com/apache/lucene/issues/13380 ``` ./gradlew test --tests TestHnswByteVectorGraph.testSortedAndUnsortedIndicesReturnSameResults -Dtests.seed=3266DCB35B0F78C8 ``` ``` org.apache.lucene.util.hnsw.TestHnswByteVect

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
easyice commented on PR #13376: URL: https://github.com/apache/lucene/pull/13376#issuecomment-2116962504 That's also a good idea! by this approach we can make `writeGroupVInts `/`readGroupVInt` use positive only. it's actually handled as an unsigned integer, so we don't need to consider the

Re: [PR] Disjunction as CompetitiveIterator for numeric dynamic pruning [lucene]

2024-05-17 Thread via GitHub
jpountz commented on code in PR #13221: URL: https://github.com/apache/lucene/pull/13221#discussion_r1604500178 ## lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java: ## @@ -405,5 +395,278 @@ public int advance(int target) throws IOException {

Re: [PR] Fix IntegerOverflow exception in postings encoding as group-varint [lucene]

2024-05-17 Thread via GitHub
jpountz commented on PR #13376: URL: https://github.com/apache/lucene/pull/13376#issuecomment-2116909274 Thanks for looking into it! Your approach works, but I'm tempted to fix it the other way around, by no longer checking if values are in the expected range with `Math.toIntExact` but rath

Re: [PR] Use IndexInput#prefetch for postings, skip data and impacts [lucene]

2024-05-17 Thread via GitHub
jpountz merged PR #13364: URL: https://github.com/apache/lucene/pull/13364 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa