Re: [PR] Misc cleanups postings codec [lucene]

2024-10-08 Thread via GitHub
original-brownbear commented on code in PR #13862: URL: https://github.com/apache/lucene/pull/13862#discussion_r1792205301 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -1502,7 +1471,7 @@ public int advance(int target) throws IOEx

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-08 Thread via GitHub
benwtrent commented on code in PR #13872: URL: https://github.com/apache/lucene/pull/13872#discussion_r1792308666 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/OffHeapQuantizedByteVectorValues.java: ## @@ -127,31 +121,42 @@ public int size() { } @Override -

[PR] Allow open-ended ranges in Intervals range [lucene]

2024-10-08 Thread via GitHub
mayya-sharipova opened a new pull request, #13873: URL: https://github.com/apache/lucene/pull/13873 Currently IntervalsSource.range function closed intervals. This will allow open-ended ranges Relates to https://github.com/apache/lucene/pull/13562 Backport for #13859 -- Th

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-08 Thread via GitHub
msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2401207999 hm there is some functional problem with the change that yields terrible recall for quantized vectors. I'll dig and fix and see if I can beef up the unit test coverage as well. -- Th

[PR] Fix 9.12.0 backcompat break (Lucene 9.12.0 cannot read 9.11.x indices written with quantized HNSW, `Lucene99HnswScalarQuantizedVectorsFormat`) [lucene]

2024-10-08 Thread via GitHub
mikemccand opened a new pull request, #13874: URL: https://github.com/apache/lucene/pull/13874 This PR: 1. Fixes a pre-existing BWC testing bug where our `int8_hnsw*.zip` bwc test files failed to actually use the scalar quantization codec ... they were just ordinary `float32` HNSW

Re: [I] Lucene 9.12 fails reading older versions of Lucene99HnswScalarQuantizedVectorsFormat [lucene]

2024-10-08 Thread via GitHub
mikemccand commented on issue #13867: URL: https://github.com/apache/lucene/issues/13867#issuecomment-2400927997 > I am struggling to create the 9.10.0 `int8_hnsw.9.10.0.zip` OK, I managed to generate this. I copied the `TestInt8HnswBackwardsCompatibility.java` from 9.12.x source to

Re: [I] Lucene 9.12 fails reading older versions of Lucene99HnswScalarQuantizedVectorsFormat [lucene]

2024-10-08 Thread via GitHub
mikemccand commented on issue #13867: URL: https://github.com/apache/lucene/issues/13867#issuecomment-2401001641 > then make @benwtrent's above proposed change, and see it pass... Sorry! I meant @parnmatt's proposed change. When I made this change, the bwc test now passes, phew. I'

[PR] Disable CFS in TestDefaultCodecParallelizesIO. [lucene]

2024-10-08 Thread via GitHub
jpountz opened a new pull request, #13875: URL: https://github.com/apache/lucene/pull/13875 `SerialIODirectory` doesn't count reads to files that are open with `ReadAdvice#RANDOM_PRELOAD` as these files are expected to be loaded in memory. Unfortunately, we cannot detect such files on compo

Re: [PR] Speedup MaxScoreCache.computeMaxScore [lucene]

2024-10-08 Thread via GitHub
jpountz commented on PR #13865: URL: https://github.com/apache/lucene/pull/13865#issuecomment-2401421616 Looks like this change is the one that triggered a speedup on `OrHighRare` on October 7th, I pushed an annotation. https://benchmarks.mikemccandless.com/OrHighRare.html -- This

Re: [PR] Speedup GlobalHitsThresholdChecker a little [lucene]

2024-10-08 Thread via GitHub
jpountz commented on PR #13836: URL: https://github.com/apache/lucene/pull/13836#issuecomment-2401422486 I pushed an annotation as this change is likely the source of the speedup we observed on nightly benchmarks on October 3rd: https://benchmarks.mikemccandless.com/TermMonthSort.html. --

Re: [PR] Misc cleanups postings codec [lucene]

2024-10-08 Thread via GitHub
original-brownbear merged PR #13862: URL: https://github.com/apache/lucene/pull/13862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...

Re: [PR] Speedup OrderedIntervalsSource [lucene]

2024-10-08 Thread via GitHub
original-brownbear commented on PR #13871: URL: https://github.com/apache/lucene/pull/13871#issuecomment-2400388528 > Do you know by any chance? I think it's the list lookups probably from looking at JFR results, but it's impossible to tell without more effort into measuring. I'll

Re: [PR] Speedup OrderedIntervalsSource [lucene]

2024-10-08 Thread via GitHub
original-brownbear merged PR #13871: URL: https://github.com/apache/lucene/pull/13871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...

Re: [PR] Misc cleanups postings codec [lucene]

2024-10-08 Thread via GitHub
original-brownbear commented on PR #13862: URL: https://github.com/apache/lucene/pull/13862#issuecomment-2400389468 Thanks Adrien! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Lucene 9.12 fails reading older versions of Lucene99HnswScalarQuantizedVectorsFormat [lucene]

2024-10-08 Thread via GitHub
mikemccand commented on issue #13867: URL: https://github.com/apache/lucene/issues/13867#issuecomment-2400220705 Ugh! I'm very sorry for breaking backwards compatibility here. I had forgotten this code is also (of course, in hindsight!) used on the read path. I think we must be missing b

Re: [PR] Misc cleanups postings codec [lucene]

2024-10-08 Thread via GitHub
jpountz commented on code in PR #13862: URL: https://github.com/apache/lucene/pull/13862#discussion_r1792142174 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -1533,7 +1502,6 @@ public int getDocIdUpTo(int level) { if

Re: [PR] Misc cleanups postings codec [lucene]

2024-10-08 Thread via GitHub
jpountz commented on code in PR #13862: URL: https://github.com/apache/lucene/pull/13862#discussion_r1792143872 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -1502,7 +1471,7 @@ public int advance(int target) throws IOException {

Re: [PR] Misc cleanups postings codec [lucene]

2024-10-08 Thread via GitHub
jpountz commented on code in PR #13862: URL: https://github.com/apache/lucene/pull/13862#discussion_r1792194753 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -1529,36 +1498,27 @@ public int getDocIdUpTo(int level) { l

Re: [PR] Misc cleanups postings codec [lucene]

2024-10-08 Thread via GitHub
original-brownbear commented on code in PR #13862: URL: https://github.com/apache/lucene/pull/13862#discussion_r1792194308 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -1533,7 +1502,6 @@ public int getDocIdUpTo(int level) {

Re: [PR] Misc cleanups postings codec [lucene]

2024-10-08 Thread via GitHub
jpountz commented on code in PR #13862: URL: https://github.com/apache/lucene/pull/13862#discussion_r1792211239 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -1502,7 +1471,7 @@ public int advance(int target) throws IOException {

Re: [I] Lucene 9.12 fails reading older versions of Lucene99HnswScalarQuantizedVectorsFormat [lucene]

2024-10-08 Thread via GitHub
mikemccand commented on issue #13867: URL: https://github.com/apache/lucene/issues/13867#issuecomment-2400800507 > The PR also changed the default value of the compress constructor parameter of `Lucene99HnswScalarQuantizedVectorsFormat `from [true to false](https://github.com/apache/lucene/

Re: [I] Lucene 9.12 fails reading older versions of Lucene99HnswScalarQuantizedVectorsFormat [lucene]

2024-10-08 Thread via GitHub
mikemccand commented on issue #13867: URL: https://github.com/apache/lucene/issues/13867#issuecomment-2400880851 OK I checked out git tag `releases/lucene/9.11.1` and made this small diff: ``` --- a/lucene/backward-codecs/src/test/org/apache/lucene/backward_index/TestInt8HnswBackwa

Re: [PR] Remove vector values copy() methods, moving IndexInput.clone() and temp storage into lower-level interfaces [lucene]

2024-10-08 Thread via GitHub
msokolov commented on PR #13872: URL: https://github.com/apache/lucene/pull/13872#issuecomment-2400812909 Thanks for the insightful feedback - yeah I had been intending to do perf testing, and then got distracted by fascinating talks and kind of forgot about these concerns! Going through th

Re: [I] Lucene 9.12 fails reading older versions of Lucene99HnswScalarQuantizedVectorsFormat [lucene]

2024-10-08 Thread via GitHub
mikemccand commented on issue #13867: URL: https://github.com/apache/lucene/issues/13867#issuecomment-2400826181 > I am trying to understand why our bwc indices test failed to catch this. Hmm ... when I unzip `int8_hnsw.9.11.zip` and run `CheckIndex` from 9.12.x on it, it seems to be

Re: [I] Lucene 9.12 fails reading older versions of Lucene99HnswScalarQuantizedVectorsFormat [lucene]

2024-10-08 Thread via GitHub
mikemccand commented on issue #13867: URL: https://github.com/apache/lucene/issues/13867#issuecomment-2400885939 I will manually regenerate all of the `int8_hnsw.9*.zip` bwc indices (9.10.0, 9.11.0, 9.11.1, and then for 10.x and 10.0.x branches also 9.12.0) ... though this is error-prone (P

Re: [I] Lucene 9.12 fails reading older versions of Lucene99HnswScalarQuantizedVectorsFormat [lucene]

2024-10-08 Thread via GitHub
mikemccand commented on issue #13867: URL: https://github.com/apache/lucene/issues/13867#issuecomment-240090 I am struggling to create the 9.10.0 `int8_hnsw.9.10.0.zip` -- we did not yet have `TestInt8HnswBackwardsCompatibility.java` in 9.10.0 ... @benwtrent do you remember how you gene

Re: [PR] Make DirectMonotonicReader.Meta more compact [lucene]

2024-10-08 Thread via GitHub
original-brownbear commented on PR #13864: URL: https://github.com/apache/lucene/pull/13864#issuecomment-2400375394 That's alright @jpountz , these things are still eating an absurd amount of heap for nothing. How about leaving the format as is and storing the error vs. my calculation in a

Re: [PR] Misc cleanups postings codec [lucene]

2024-10-08 Thread via GitHub
original-brownbear commented on code in PR #13862: URL: https://github.com/apache/lucene/pull/13862#discussion_r1792215104 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/Lucene912PostingsReader.java: ## @@ -1502,7 +1471,7 @@ public int advance(int target) throws IOEx

Re: [PR] Reduce allocations in ByteBuffersDataOutput.writeString [lucene]

2024-10-08 Thread via GitHub
original-brownbear commented on PR #13863: URL: https://github.com/apache/lucene/pull/13863#issuecomment-2400392463 Sounds good @jpountz , reverted the `DataOutput` stuff for now :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Lucene 9.12 fails reading older versions of Lucene99HnswScalarQuantizedVectorsFormat [lucene]

2024-10-08 Thread via GitHub
mikemccand commented on issue #13867: URL: https://github.com/apache/lucene/issues/13867#issuecomment-2400966177 > I am struggling to create the 9.10.0 `int8_hnsw.9.10.0.zip` -- we did not yet have `TestInt8HnswBackwardsCompatibility.java` in 9.10.0 ... @benwtrent do you remember how you ge

Re: [PR] Speedup OrderedIntervalsSource [lucene]

2024-10-08 Thread via GitHub
jpountz commented on code in PR #13871: URL: https://github.com/apache/lucene/pull/13871#discussion_r1792094616 ## lucene/queries/src/java/org/apache/lucene/queries/intervals/OrderedIntervalsSource.java: ## @@ -124,38 +124,54 @@ public int nextInterval() throws IOException {