Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-15 Thread via GitHub
epotyom commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2290926100 Interesting... Thanks for isolating the change that causes the regression Greg! > impacted tasks are possibly taking the `leafSlices.length == 0` condition branch and avoiding the

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-15 Thread via GitHub
jpountz commented on PR #13636: URL: https://github.com/apache/lucene/pull/13636#issuecomment-2291136659 A few queries got a small-ish speedup, e.g. - [AndMedOrHighHigh](https://home.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html) +2.5% - [AndHighHigh](https://home.apache

[PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-15 Thread via GitHub
jpountz opened a new pull request, #13658: URL: https://github.com/apache/lucene/pull/13658 This updates file formats to compute prefix sums by summing up 8 deltas per long at the same time if the number of bits per value is 4 or less, and 4 deltas per long at the same time if the number of

Re: [PR] Slightly speed up decoding blocks of postings/freqs/positions. [lucene]

2024-08-15 Thread via GitHub
jpountz commented on PR #13631: URL: https://github.com/apache/lucene/pull/13631#issuecomment-2291233215 @gsmiller Rebasing this PR proved a bit challenging after #13636 got merged, so I ended up creating a new one that only speeds up prefix sums without disabling slower numbers of bits per

[I] TestHnswFloatVectorGraph.testRandomReadWriteAndMerge fails with java.lang.IndexOutOfBoundsException [lucene]

2024-08-15 Thread via GitHub
ChrisHegarty opened a new issue, #13659: URL: https://github.com/apache/lucene/issues/13659 ### Description ``` TestHnswFloatVectorGraph > testRandomReadWriteAndMerge FAILED java.lang.IndexOutOfBoundsException: Index 2147483647 out of bounds for length 18 at __ra

Re: [I] TestHnswFloatVectorGraph.testReadWrite fails on branch 9x [lucene]

2024-08-15 Thread via GitHub
msokolov closed issue #13653: TestHnswFloatVectorGraph.testReadWrite fails on branch 9x URL: https://github.com/apache/lucene/issues/13653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] TestHnswFloatVectorGraph.testReadWrite fails on branch 9x [lucene]

2024-08-15 Thread via GitHub
msokolov commented on issue #13653: URL: https://github.com/apache/lucene/issues/13653#issuecomment-2291316793 OK I'll mark this fixed now since we seem to have gone ~24h without any more failures from automated builds that I can see -- This is an automated message from the Apache Git Ser

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-08-15 Thread via GitHub
msokolov commented on PR #13469: URL: https://github.com/apache/lucene/pull/13469#issuecomment-2291338923 Hi @navneet1v, I think you may be right although TBH I find the way Lucene handles formats with SPI to be somewhat confusing. We're using this in our own service with a custom Codec th

Re: [PR] nocommit: demonstrate how a minor change in IndexSearcher can have an inexplicable performance impact [lucene]

2024-08-15 Thread via GitHub
gsmiller commented on PR #13657: URL: https://github.com/apache/lucene/pull/13657#issuecomment-2291353795 I got an even tighter isolation on the regressing change but still can't understand why it could be happening. It seems to have to do with whether-or-not the collectors list gets create

Re: [PR] Revert changes to IndexSearcher brought in by GH#13568 [lucene]

2024-08-15 Thread via GitHub
gsmiller commented on PR #13656: URL: https://github.com/apache/lucene/pull/13656#issuecomment-2291395658 Investigation continues on how #13568 could possibly have caused a regression to various nightly benchmark tasks, but I've confirmed that this patch fixed the issues (see [Term](https:

Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-08-15 Thread via GitHub
msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2291406018 OK I'm beginning to see that it would be easier to start with a tool that rewrites the index as you did with the `BPIndexReorderer` -- integrating this thing as a new index format

Re: [PR] Revert changes to IndexSearcher brought in by GH#13568 [lucene]

2024-08-15 Thread via GitHub
mikemccand commented on PR #13656: URL: https://github.com/apache/lucene/pull/13656#issuecomment-2291583847 Thanks @gsmiller -- looks like this did indeed fix the nightly benchy e.g. [CombinedTerm](https://home.apache.org/~mikemccand/lucenebench/CombinedTerm.html), [TermQuery](https://home.

Re: [PR] Revert changes to IndexSearcher brought in by GH#13568 [lucene]

2024-08-15 Thread via GitHub
gsmiller commented on PR #13656: URL: https://github.com/apache/lucene/pull/13656#issuecomment-2291752889 Totally @mikemccand . Have you followed what @epotyom and I found so far in #13657? It's wild. Just moving the initialization of the array list before or after the call to createWeight

Re: [I] TestHnswFloatVectorGraph.testRandomReadWriteAndMerge fails with java.lang.IndexOutOfBoundsException [lucene]

2024-08-15 Thread via GitHub
benwtrent commented on issue #13659: URL: https://github.com/apache/lucene/issues/13659#issuecomment-2291778869 Yeah, gitbisect indicate `217828736c4 - gh-12627: HnswGraphBuilder connects disconnected HNSW graph components (#13566) (7 days ago) ` I am not 100% sure whats up. -- Thi

Re: [I] TestHnswFloatVectorGraph.testRandomReadWriteAndMerge fails with java.lang.IndexOutOfBoundsException [lucene]

2024-08-15 Thread via GitHub
benwtrent commented on issue #13659: URL: https://github.com/apache/lucene/issues/13659#issuecomment-2291853791 From what I can tell: - `components` will return a component where the entry_point is `NO_MORE_DOCS` - This occurs because the `notFullyConnected.nextSetBit(0);` retur

Re: [PR] Expose FlatVectorsFormat [lucene]

2024-08-15 Thread via GitHub
navneet1v commented on PR #13469: URL: https://github.com/apache/lucene/pull/13469#issuecomment-2291858889 Hi @msokolov thanks for sharing your thoughts. > I find the way Lucene handles formats with SPI to be somewhat confusing. +1. but I always feel my experience with SPI is lo

Re: [I] TestHnswFloatVectorGraph.testRandomReadWriteAndMerge fails with java.lang.IndexOutOfBoundsException [lucene]

2024-08-15 Thread via GitHub
benwtrent commented on issue #13659: URL: https://github.com/apache/lucene/issues/13659#issuecomment-2291899075 @msokolov ^ what say you? It seems that a `NO_MORE_DOCS` indicates that the `notFullyConnected` just didn't ever get selected. Then later, when iterating the components ``

[PR] Only attempt to connect components when entry point is valid [lucene]

2024-08-15 Thread via GitHub
benwtrent opened a new pull request, #13660: URL: https://github.com/apache/lucene/pull/13660 We should only connect a component if its entry point is valid. closes: https://github.com/apache/lucene/issues/13659 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-15 Thread via GitHub
gsmiller commented on code in PR #13658: URL: https://github.com/apache/lucene/pull/13658#discussion_r1718827005 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForDeltaUtil.java: ## @@ -62,22 +256,374 @@ void encodeDeltas(long[] longs, DataOutput out) throws IOExcep

Re: [I] TestHnswFloatVectorGraph.testRandomReadWriteAndMerge fails with java.lang.IndexOutOfBoundsException [lucene]

2024-08-15 Thread via GitHub
msokolov commented on issue #13659: URL: https://github.com/apache/lucene/issues/13659#issuecomment-2292058048 Oh, the gift that keeps on giving! Your solution seems reasonable. I mean it would be nice if we didn't generate these degenerate Component in the first place? But this will work

Re: [PR] Only attempt to connect components when entry point is valid [lucene]

2024-08-15 Thread via GitHub
msokolov commented on code in PR #13660: URL: https://github.com/apache/lucene/pull/13660#discussion_r1718901258 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -456,6 +456,9 @@ private boolean connectComponents(int level) throws IOException {

Re: [PR] Only attempt to connect components when entry point is valid [lucene]

2024-08-15 Thread via GitHub
msokolov commented on code in PR #13660: URL: https://github.com/apache/lucene/pull/13660#discussion_r1718901258 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -456,6 +456,9 @@ private boolean connectComponents(int level) throws IOException {

[PR] Upgrade spotless to 6.9.1, google java format to 1.23.0. [lucene]

2024-08-15 Thread via GitHub
dweiss opened a new pull request, #13661: URL: https://github.com/apache/lucene/pull/13661 This is a trivial bump of plugin dependencies. One commit updates versions, another is a tidy run on the newer spotless/ google java format (mostly indentation within comments of switch statements).

Re: [PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-15 Thread via GitHub
jpountz commented on code in PR #13658: URL: https://github.com/apache/lucene/pull/13658#discussion_r1718950733 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultPostingDecodingUtil.java: ## @@ -29,13 +29,14 @@ public DefaultPostingDecodingUtil(IndexInput i

Re: [PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-15 Thread via GitHub
jpountz commented on code in PR #13658: URL: https://github.com/apache/lucene/pull/13658#discussion_r1718955964 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/PostingIndexInput.java: ## @@ -50,6 +53,6 @@ public void decode(int bitsPerValue, long[] longs) throws IOEx

Re: [PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-15 Thread via GitHub
jpountz commented on PR #13658: URL: https://github.com/apache/lucene/pull/13658#issuecomment-2292214610 Thanks for taking a look @gsmiller ! I believe that I addressed all your (good) comments. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Optimize decoding blocks of postings using the vector API. (#13636) [lucene]

2024-08-15 Thread via GitHub
jpountz commented on PR #13652: URL: https://github.com/apache/lucene/pull/13652#issuecomment-2292242821 OK, I did it for src/java20 too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-15 Thread via GitHub
gsmiller commented on code in PR #13658: URL: https://github.com/apache/lucene/pull/13658#discussion_r1719059369 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultPostingDecodingUtil.java: ## @@ -29,13 +29,14 @@ public DefaultPostingDecodingUtil(IndexInput

Re: [PR] Speed up prefix sums when decoding doc IDs. [lucene]

2024-08-15 Thread via GitHub
gsmiller commented on code in PR #13658: URL: https://github.com/apache/lucene/pull/13658#discussion_r1719062831 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/PostingIndexInput.java: ## @@ -50,6 +53,6 @@ public void decode(int bitsPerValue, long[] longs) throws IOE

Re: [PR] Reduce memory usage of SkipListWriter [lucene]

2024-08-15 Thread via GitHub
github-actions[bot] commented on PR #13576: URL: https://github.com/apache/lucene/pull/13576#issuecomment-2292498818 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi