Re: [PR] Move postings back to int[] to take advantage of having more lanes per vector. [lucene]

2024-10-31 Thread via GitHub
jpountz merged PR #13968: URL: https://github.com/apache/lucene/pull/13968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Rename NodeHash to FSTSuffixNodeCache [lucene]

2024-10-31 Thread via GitHub
dungba88 commented on PR #13259: URL: https://github.com/apache/lucene/pull/13259#issuecomment-2451381448 Hi Lucene community, would someone kindly take a look at this PR? This is only minor renaming and Javadoc improvement. -- This is an automated message from the Apache Git Service. To

Re: [I] Move vector search from IndexInput to RandomAccessInput [lucene]

2024-10-31 Thread via GitHub
dungba88 commented on issue #13938: URL: https://github.com/apache/lucene/issues/13938#issuecomment-2451355944 > I think this will be helpful since currently we cannot share these readers across threads -- they retain the state information about the current position. Not sure how much benef

Re: [PR] Remove TODO in FSTCompiler#freezeTail. [lucene]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #13923: URL: https://github.com/apache/lucene/pull/13923#issuecomment-2451063955 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Use Arrays.mismatch in FSTCompiler#add. [lucene]

2024-10-31 Thread via GitHub
github-actions[bot] commented on PR #13924: URL: https://github.com/apache/lucene/pull/13924#issuecomment-2451063927 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] minor javadoc correction on VectorUtilSupport#findNextGEQ [lucene]

2024-10-31 Thread via GitHub
gsmiller commented on PR #13969: URL: https://github.com/apache/lucene/pull/13969#issuecomment-2450411129 > I'm also considering renaming length to to since length usually is a number of entries after from while it's an absolute end offset here +1. I noticed this as well when writing

Re: [PR] minor javadoc correction on VectorUtilSupport#findNextGEQ [lucene]

2024-10-31 Thread via GitHub
jpountz commented on PR #13969: URL: https://github.com/apache/lucene/pull/13969#issuecomment-2450030846 I wrote the javadocs this way on purpose, so that it would still work to create an IntVector/LongVector that starts before `from` and count the number of values that are less than the ta

Re: [PR] Move postings back to int[] to take advantage of having more lanes per vector. [lucene]

2024-10-31 Thread via GitHub
jpountz commented on code in PR #13968: URL: https://github.com/apache/lucene/pull/13968#discussion_r1824816062 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/PostingDecodingUtil.java: ## @@ -55,4 +55,30 @@ public void splitLongs( c[cIndex + i] &= cMask;

Re: [PR] Move postings back to int[] to take advantage of having more lanes per vector. [lucene]

2024-10-31 Thread via GitHub
jpountz commented on code in PR #13968: URL: https://github.com/apache/lucene/pull/13968#discussion_r1824819989 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/PostingDecodingUtil.java: ## @@ -55,4 +55,30 @@ public void splitLongs( c[cIndex + i] &= cMask;

Re: [PR] minor javadoc correction on VectorUtilSupport#findNextGEQ [lucene]

2024-10-31 Thread via GitHub
jpountz commented on PR #13969: URL: https://github.com/apache/lucene/pull/13969#issuecomment-2450344763 This sounds good to me, maybe extract it to a function to avoid increasing the method size too much? Now that you made me look harder at this code, I'm also considering renaming `length`

Re: [PR] minor javadoc correction on VectorUtilSupport#findNextGEQ [lucene]

2024-10-31 Thread via GitHub
gsmiller commented on PR #13969: URL: https://github.com/apache/lucene/pull/13969#issuecomment-2450301205 @jpountz after thinking a little more, I wonder if an `assert` would make sense to guard against unsorted data between index `0` and `from`? Probably quite unlikely, but would also be n

Re: [PR] Move postings back to int[] to take advantage of having more lanes per vector. [lucene]

2024-10-31 Thread via GitHub
gsmiller commented on code in PR #13968: URL: https://github.com/apache/lucene/pull/13968#discussion_r1824790729 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/PostingDecodingUtil.java: ## @@ -55,4 +55,30 @@ public void splitLongs( c[cIndex + i] &= cMask

Re: [PR] Move postings back to int[] to take advantage of having more lanes per vector. [lucene]

2024-10-31 Thread via GitHub
jpountz commented on PR #13968: URL: https://github.com/apache/lucene/pull/13968#issuecomment-2450268830 I plan on merging tomorrow, so that we have two data points with longs on nightly benchmarks before seeing how it performs with ints. -- This is an automated message from the Apache Gi

Re: [PR] minor javadoc correction on VectorUtilSupport#findNextGEQ [lucene]

2024-10-31 Thread via GitHub
gsmiller closed pull request #13969: minor javadoc correction on VectorUtilSupport#findNextGEQ URL: https://github.com/apache/lucene/pull/13969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] minor javadoc correction on VectorUtilSupport#findNextGEQ [lucene]

2024-10-31 Thread via GitHub
gsmiller commented on PR #13969: URL: https://github.com/apache/lucene/pull/13969#issuecomment-2450206811 @jpountz thanks for the clarification! Let's not make the javadoc change then since it sounds like a reasonable reason to keep the requirement that values are sorted beginning at index

[PR] Simplify codec setup in vector-related tests. [lucene]

2024-10-31 Thread via GitHub
jpountz opened a new pull request, #13970: URL: https://github.com/apache/lucene/pull/13970 Many of vector-related tests set up a codec manually by extending the current codec. This makes bumping the current codec a bit painful as all these files need to be touched. This commit migrates to

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-31 Thread via GitHub
mikemccand commented on PR #13910: URL: https://github.com/apache/lucene/pull/13910#issuecomment-2450133029 > Could you add a CHANGES entry in 9.12 for your bug fix for 9.12.1? Ahh yes sorry I will do that today! -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-10-31 Thread via GitHub
jpountz merged PR #13763: URL: https://github.com/apache/lucene/pull/13763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-31 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2450044413 If you check out data at https://github.com/apache/lucene/pull/13692#issuecomment-2324658146, `AndHighHigh` and `AndHighMed` tend to advance a bit further than `CountAndHighHigh` and `C

Re: [PR] Account for 0 graph size when initializing HNSW graph [lucene]

2024-10-31 Thread via GitHub
mayya-sharipova commented on PR #13964: URL: https://github.com/apache/lucene/pull/13964#issuecomment-2450001895 @john-wagster Thanks for the review. I tried to write tests, but it needs a lot of setup and mocks, and I thought it does't worth. But I plan to write integration kind of t

Re: [PR] Account for 0 graph size when initializing HNSW graph [lucene]

2024-10-31 Thread via GitHub
mayya-sharipova merged PR #13964: URL: https://github.com/apache/lucene/pull/13964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lu

Re: [PR] minor javadoc correction on VectorUtilSupport#findNextGEQ [lucene]

2024-10-31 Thread via GitHub
gsmiller commented on PR #13969: URL: https://github.com/apache/lucene/pull/13969#issuecomment-244971 @jpountz was looking through #13958 retroactively to understand the change and I _think_ I spotted a small javadoc error. Can you take a peek? Even though this is super trivial, I wante

[PR] minor javadoc correction on VectorUtilSupport#findNextGEQ [lucene]

2024-10-31 Thread via GitHub
gsmiller opened a new pull request, #13969: URL: https://github.com/apache/lucene/pull/13969 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] Speed up advancing within a block, take 2. [lucene]

2024-10-31 Thread via GitHub
jpountz commented on PR #13958: URL: https://github.com/apache/lucene/pull/13958#issuecomment-2449813210 Nightly benchmarks just picked up the change with a mix of speedups and slowdowns: https://benchmarks.mikemccandless.com/2024.10.30.18.12.23.html. Here are the main ones I'm seeing:

Re: [I] Move vector search from IndexInput to RandomAccessInput [lucene]

2024-10-31 Thread via GitHub
msokolov commented on issue #13938: URL: https://github.com/apache/lucene/issues/13938#issuecomment-2449719834 I think this will be helpful since currently we cannot share these readers across threads -- they retain the state information about the current position. Not sure how much benefi

Re: [PR] Move postings back to int[] to take advantage of having more lanes per vector. [lucene]

2024-10-31 Thread via GitHub
jpountz commented on PR #13968: URL: https://github.com/apache/lucene/pull/13968#issuecomment-2449457800 Here is a `luceneutil` run against `wikibigall`: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

[PR] Move postings back to int[]. [lucene]

2024-10-31 Thread via GitHub
jpountz opened a new pull request, #13968: URL: https://github.com/apache/lucene/pull/13968 In Lucene 8.4, we updated postings to work on long[] arrays internally. This allowed us to workaround the lack of explicit vectorization (auto-vectorization doesn't detect all the scenarios that we w

Re: [I] Move vector search from IndexInput to RandomAccessInput [lucene]

2024-10-31 Thread via GitHub
dungba88 commented on issue #13938: URL: https://github.com/apache/lucene/issues/13938#issuecomment-2449356939 Hi, I'm learning Lucene KNN and this seems to be a workable PR for beginner. Just curious about the motivation behind this change. Is it only for cleaner code, or are we also suppo