[I] Reproducible error in TestLucene90HnswVectorsFormat.testIndexedValueNotAliased [lucene]

2023-11-24 Thread via GitHub
iverase opened a new issue, #12840: URL: https://github.com/apache/lucene/issues/12840 Command to reproduce: ``` ./gradlew test --tests TestLucene90HnswVectorsFormat.testIndexedValueNotAliased -Dtests.seed=611EEBD0148F03C7 ``` error: ``` org.apache.lucene.backward_

Re: [PR] Skip decoding tail freqs when they are not needed. [lucene]

2023-11-24 Thread via GitHub
jpountz commented on PR #12832: URL: https://github.com/apache/lucene/pull/12832#issuecomment-1825371734 This seems to have further helped [`prefix` queries](http://people.apache.org/~mikemccand/lucenebench/Prefix3.html). I'll add an annotation. -- This is an automated message from the A

Re: [I] Move group-varint encoding/decoding logic to DataOutput/DataInput? [lucene]

2023-11-24 Thread via GitHub
jpountz commented on issue #12826: URL: https://github.com/apache/lucene/issues/12826#issuecomment-1825393392 Let's move your branch to a PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-24 Thread via GitHub
gf2121 commented on code in PR #12838: URL: https://github.com/apache/lucene/pull/12838#discussion_r1404145355 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipReader.java: ## @@ -48,7 +48,7 @@ * * Therefore, we'll trim df before passing it to the interf

Re: [I] Move group-varint encoding/decoding logic to DataOutput/DataInput? [lucene]

2023-11-24 Thread via GitHub
easyice commented on issue #12826: URL: https://github.com/apache/lucene/issues/12826#issuecomment-1825397930 Okay! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] add dedicated test to assert internals of LZ4 hashtable [LUCENE-9190] [lucene]

2023-11-24 Thread via GitHub
slow-J commented on issue #10230: URL: https://github.com/apache/lucene/issues/10230#issuecomment-1825550388 Already implemented in https://github.com/apache/lucene-solr/pull/1236, this issue can be closed. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-24 Thread via GitHub
gf2121 commented on code in PR #12838: URL: https://github.com/apache/lucene/pull/12838#discussion_r1404145355 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipReader.java: ## @@ -48,7 +48,7 @@ * * Therefore, we'll trim df before passing it to the interf

Re: [PR] Hide the internal data structure of HeapPointWriter [lucene]

2023-11-24 Thread via GitHub
iverase merged PR #12762: URL: https://github.com/apache/lucene/pull/12762 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-11-24 Thread via GitHub
jpountz commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1404341416 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { } }

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-11-24 Thread via GitHub
jpountz commented on PR #12841: URL: https://github.com/apache/lucene/pull/12841#issuecomment-1825685057 And maybe `BufferedIndexInput` too for folks using `NIOFSDirectory`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-24 Thread via GitHub
jpountz commented on code in PR #12838: URL: https://github.com/apache/lucene/pull/12838#discussion_r1404365624 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipReader.java: ## @@ -48,7 +48,7 @@ * * Therefore, we'll trim df before passing it to the inter

Re: [PR] Move group-varint encoding/decoding logic to DataOutput/DataInput [lucene]

2023-11-24 Thread via GitHub
easyice commented on code in PR #12841: URL: https://github.com/apache/lucene/pull/12841#discussion_r1404387618 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -303,6 +304,30 @@ public byte readByte(long pos) throws IOException { } }

[PR] Use group-varint encode the positions [lucene]

2023-11-24 Thread via GitHub
easyice opened a new pull request, #12842: URL: https://github.com/apache/lucene/pull/12842 Thanks the suggestion from @jpountz , as discussed in https://github.com/apache/lucene/issues/12826 This PR use group-varint to encode some vint values if `storeOffsets` is true, it's still u

Re: [I] Use LinkedList instead of manual array re-sizing for better throughput. [LUCENE-9432] [lucene]

2023-11-24 Thread via GitHub
slow-J commented on issue #10472: URL: https://github.com/apache/lucene/issues/10472#issuecomment-1825871537 I took a quick look at this 3 years on. I took @mohammadsadiq's patch and applied it to `IDVersionSegmentTermsEnum` and `OrdsSegmentTermsEnum` I then changed the LinkedList

Re: [PR] Use group-varint encode the positions [lucene]

2023-11-24 Thread via GitHub
jpountz commented on PR #12842: URL: https://github.com/apache/lucene/pull/12842#issuecomment-1825874597 Thanks for looking. Unfortunately, the case I'm most interested in is when `storeOffsets` is false and there are no payloads, since this is the default. :) -- This is an automated mess

Re: [PR] Faster prefix sum for bitsPerValue up to 9. [lucene]

2023-11-24 Thread via GitHub
jpountz commented on PR #12843: URL: https://github.com/apache/lucene/pull/12843#issuecomment-1825884854 luceneutil doesn't see a noticeable difference (all p-values are high) but the micro-benchmark that is attached to this PR seems to see an improvement: ``` main Benchmark

Re: [I] Grow arrays up to a given limit to avoid overallocation where possible [lucene]

2023-11-24 Thread via GitHub
jpountz commented on issue #12839: URL: https://github.com/apache/lucene/issues/12839#issuecomment-1825928704 If I'm not mistaken, the `NeighborArray` class we use for vector search may have similar needs (it should probably not size its data structure to `maxSize` i the constructor?). --

Re: [I] MultiSimilarity.MultiSimScorer should sum up scores into a double [lucene]

2023-11-24 Thread via GitHub
jpountz closed issue #12675: MultiSimilarity.MultiSimScorer should sum up scores into a double URL: https://github.com/apache/lucene/issues/12675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] MultiSimilarity.MultiSimScorer should sum up scores into a double [lucene]

2023-11-24 Thread via GitHub
jpountz commented on issue #12675: URL: https://github.com/apache/lucene/issues/12675#issuecomment-1825930715 @shubhamvishu Yes indeed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Faster prefix sum for bitsPerValue up to 9. [lucene]

2023-11-24 Thread via GitHub
jpountz commented on PR #12843: URL: https://github.com/apache/lucene/pull/12843#issuecomment-1826052610 Actually we can do even better by better tuning the disk layout for the prefix sum. Converting this PR to a draft until this is implemented. -- This is an automated message from the Ap

Re: [I] Grow arrays up to a given limit to avoid overallocation where possible [lucene]

2023-11-24 Thread via GitHub
stefanvodita commented on issue #12839: URL: https://github.com/apache/lucene/issues/12839#issuecomment-1826057193 Thank you for the pointer @jpountz! I'll put together a PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Improve set deletions percentage javadoc [lucene]

2023-11-24 Thread via GitHub
yugushihuang commented on code in PR #12828: URL: https://github.com/apache/lucene/pull/12828#discussion_r1404662302 ## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ## @@ -150,9 +150,10 @@ public double getMaxMergedSegmentMB() { } /** - * Contro

[PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-24 Thread via GitHub
stefanvodita opened a new pull request, #12844: URL: https://github.com/apache/lucene/pull/12844 In cases where we know there is an upper limit to the potential size of an array, we can use `growInRange` to avoid allocating beyond that limit. We address such cases in `DirectoryTaxonom

Re: [I] Grow arrays up to a given limit to avoid overallocation where possible [lucene]

2023-11-24 Thread via GitHub
stefanvodita commented on issue #12839: URL: https://github.com/apache/lucene/issues/12839#issuecomment-1826125298 I added the new method and used it for `DirectoryTaxonomyReader` and `NeighborArray` (#12844). There might be other places where it makes sense to use, but I thought it best to

Re: [PR] Use group-varint encode the positions [lucene]

2023-11-24 Thread via GitHub
easyice commented on PR #12842: URL: https://github.com/apache/lucene/pull/12842#issuecomment-1826180124 Thanks for your suggestion, i'm thinking about that too, i will continue working on this. -- This is an automated message from the Apache Git Service. To respond to the message, pl