Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
rmuir commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1706278927 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
rmuir commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1706272955 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
rmuir commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1706268616 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
rmuir commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1706267079 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Fix race condition on flush for DWPT seqNo generation [lucene]

2024-08-06 Thread via GitHub
jpountz commented on code in PR #13627: URL: https://github.com/apache/lucene/pull/13627#discussion_r1706129153 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java: ## @@ -430,10 +430,16 @@ long updateDocuments( } flushingDWPT = flushControl.doAfte

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
jpountz commented on PR #13636: URL: https://github.com/apache/lucene/pull/13636#issuecomment-2272150091 For what it's worth, this PR is quite different from https://github.com/apache/lucene/pull/12412 in that it does not rewrite `ForUtil.java` completely, only the bits where we read some l

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
jpountz commented on PR #13636: URL: https://github.com/apache/lucene/pull/13636#issuecomment-2272145780 Thanks @uschindler, it was helpful. I refactored the PR a bit based on your recommendation. It's now ready for review. -- This is an automated message from the Apache Git Service. To r

Re: [PR] Delegating the matches in PointRangeQuery weight to relate method [lucene]

2024-08-06 Thread via GitHub
gsmiller commented on PR #13599: URL: https://github.com/apache/lucene/pull/13599#issuecomment-2272107532 In general, I really appreciate that you're looking for opportunities to cleanup the codebase and find ways to avoid duplicated logic. Thanks @jainankitk ! At the same time, I don't per

Re: [PR] Fix race condition on flush for DWPT seqNo generation [lucene]

2024-08-06 Thread via GitHub
benwtrent commented on code in PR #13627: URL: https://github.com/apache/lucene/pull/13627#discussion_r1705881356 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -138,6 +138,15 @@ private synchronized boolean contains(DocumentsWriterPerT

Re: [PR] Fix race condition on flush for DWPT seqNo generation [lucene]

2024-08-06 Thread via GitHub
benwtrent commented on code in PR #13627: URL: https://github.com/apache/lucene/pull/13627#discussion_r1705880530 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterDeleteQueue.java: ## @@ -636,7 +636,7 @@ long getMaxSeqNo() { } /** Returns true if it was adv

Re: [PR] Fix race condition on flush for DWPT seqNo generation [lucene]

2024-08-06 Thread via GitHub
benwtrent commented on code in PR #13627: URL: https://github.com/apache/lucene/pull/13627#discussion_r1705880002 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java: ## @@ -430,10 +430,16 @@ long updateDocuments( } flushingDWPT = flushControl.doAf

Re: [PR] Delegating the matches in PointRangeQuery weight to relate method [lucene]

2024-08-06 Thread via GitHub
jainankitk closed pull request #13599: Delegating the matches in PointRangeQuery weight to relate method URL: https://github.com/apache/lucene/pull/13599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Delegating the matches in PointRangeQuery weight to relate method [lucene]

2024-08-06 Thread via GitHub
jainankitk commented on code in PR #13599: URL: https://github.com/apache/lucene/pull/13599#discussion_r1705858403 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -127,22 +127,8 @@ public final Weight createWeight(IndexSearcher searcher, ScoreMode s

Re: [PR] Slightly speed up decoding blocks of postings/freqs/positions. [lucene]

2024-08-06 Thread via GitHub
gsmiller commented on PR #13631: URL: https://github.com/apache/lucene/pull/13631#issuecomment-2271718319 > which needs slow scalar code to properly decode the last values in a block Got it, thanks. I assumed it had something to do with this, but my confusion came from the fact that t

Re: [PR] Fix race condition on flush for DWPT seqNo generation [lucene]

2024-08-06 Thread via GitHub
benwtrent commented on code in PR #13627: URL: https://github.com/apache/lucene/pull/13627#discussion_r1705834154 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -140,6 +140,11 @@ void marksAsFreeAndUnlock(DocumentsWriterPerThread state)

Re: [PR] gh-12627: HnswGraphBuilder connects disconnected HNSW graph components [lucene]

2024-08-06 Thread via GitHub
msokolov commented on PR #13566: URL: https://github.com/apache/lucene/pull/13566#issuecomment-2271669367 I plan to revisit this with a modified approach to address some gaps here: 1. Instead of computing the components rooted at node 0 and others, or trying to compute strongly-conne

Re: [PR] WIP do not merge [lucene]

2024-08-06 Thread via GitHub
msokolov commented on PR #13577: URL: https://github.com/apache/lucene/pull/13577#issuecomment-2271664069 This strongly-connected test is hard to make efficient and it's actually more than we need, given the way we search the graphs hierarchically. I'll follow up with a different approach

Re: [PR] WIP do not merge [lucene]

2024-08-06 Thread via GitHub
msokolov closed pull request #13577: WIP do not merge URL: https://github.com/apache/lucene/pull/13577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: i

Re: [PR] HnswLock: access locks via hash and only use for concurrent indexing [lucene]

2024-08-06 Thread via GitHub
msokolov commented on PR #13581: URL: https://github.com/apache/lucene/pull/13581#issuecomment-2271660159 OK, @Edarke that sounds like a good idea. If you care to follow up with a change to the hash function that avoids autoboxing and smears, I'd be happy to review it. -- This is an auto

Re: [PR] Fix race condition on flush for DWPT seqNo generation [lucene]

2024-08-06 Thread via GitHub
benwtrent commented on code in PR #13627: URL: https://github.com/apache/lucene/pull/13627#discussion_r1705753409 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -140,6 +140,11 @@ void marksAsFreeAndUnlock(DocumentsWriterPerThread state)

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
uschindler commented on PR #13636: URL: https://github.com/apache/lucene/pull/13636#issuecomment-2271580963 > how to make it properly work out of the box (without enabling the vector module and preview features) The vector module must always be enabled, but that's also the case for `

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
jpountz commented on PR #13636: URL: https://github.com/apache/lucene/pull/13636#issuecomment-2271546834 Oh, I had forgotten about this other PR! Thanks Uwe, I'll look into it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Fix race condition on flush for DWPT seqNo generation [lucene]

2024-08-06 Thread via GitHub
jpountz commented on code in PR #13627: URL: https://github.com/apache/lucene/pull/13627#discussion_r1705717977 ## lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThreadPool.java: ## @@ -140,6 +140,11 @@ void marksAsFreeAndUnlock(DocumentsWriterPerThread state) {

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
uschindler commented on PR #13636: URL: https://github.com/apache/lucene/pull/13636#issuecomment-2271539092 Hi, your setup needs to be a bit different: - Don't invent new providers, use the existing one, so add a new factory method to the generic VectorizationProvider. Here you return

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
jpountz commented on PR #13636: URL: https://github.com/apache/lucene/pull/13636#issuecomment-2271488324 @uschindler @ChrisHegarty I could use a bit of help with this change regarding code organization and how to make it properly work out of the box (without enabling the vector module and p

Re: [PR] Slightly speed up decoding blocks of postings/freqs/positions. [lucene]

2024-08-06 Thread via GitHub
jpountz commented on PR #13631: URL: https://github.com/apache/lucene/pull/13631#issuecomment-2271482402 @gsmiller FYI I have another change that speeds up decoding postings at #13636 that seems to be a bit more impactful, so I'll try to figure that other one out before coming back to this

[PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-06 Thread via GitHub
jpountz opened a new pull request, #13636: URL: https://github.com/apache/lucene/pull/13636 Our postings use a layout that helps take advantage of Java's auto-vectorization to be reasonably fast to decode. But we can make it a bit faster by using explicit vectorization on MemorySegment:

Re: [PR] Slightly speed up decoding blocks of postings/freqs/positions. [lucene]

2024-08-06 Thread via GitHub
jpountz commented on PR #13631: URL: https://github.com/apache/lucene/pull/13631#issuecomment-2271433649 Thanks for looking @gsmiller ! Regarding numbers of bits per value, some numbers make the code on the JVM/CPU, you can look at the difference in the generated code for `decode8`, which m

Re: [PR] Slightly speed up decoding blocks of postings/freqs/positions. [lucene]

2024-08-06 Thread via GitHub
jpountz commented on code in PR #13631: URL: https://github.com/apache/lucene/pull/13631#discussion_r1705595936 ## lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForDeltaUtil.java: ## @@ -41,10 +54,275 @@ private static void prefixSumOfOnes(long[] arr, long base) {

Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-08-06 Thread via GitHub
seanmacavaney commented on PR #13635: URL: https://github.com/apache/lucene/pull/13635#issuecomment-2271353524 > I could see it being very nice, or behaving poorly depending on the seed query (which, I guess is expected). We could probably predict whether a seed set is good or bad bas

Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-08-06 Thread via GitHub
seanmacavaney commented on code in PR #13635: URL: https://github.com/apache/lucene/pull/13635#discussion_r1705579214 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -70,6 +72,43 @@ public static void search( search(scorer, knnCollector, gr

Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-08-06 Thread via GitHub
seanmacavaney commented on code in PR #13635: URL: https://github.com/apache/lucene/pull/13635#discussion_r1705575607 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -156,6 +189,44 @@ private TopDocs getLeafResults( } } + private Do

Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-08-06 Thread via GitHub
benwtrent commented on code in PR #13635: URL: https://github.com/apache/lucene/pull/13635#discussion_r1705418758 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -156,6 +189,44 @@ private TopDocs getLeafResults( } } + private DocIdS

Re: [PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-08-06 Thread via GitHub
benwtrent commented on code in PR #13635: URL: https://github.com/apache/lucene/pull/13635#discussion_r1705409803 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ## @@ -70,6 +72,43 @@ public static void search( search(scorer, knnCollector, graph,

Re: [I] Seeding HNSW Search [lucene]

2024-08-06 Thread via GitHub
seanmacavaney commented on issue #13634: URL: https://github.com/apache/lucene/issues/13634#issuecomment-2271094294 Thanks! I just opened a draft PR (#13635). To answer your questions: > The API, this is always tricky to get correct I've struggled a bit with this. The PR has an

[PR] Add AbstractKnnVectorQuery.seed for seeded HNSW [lucene]

2024-08-06 Thread via GitHub
seanmacavaney opened a new pull request, #13635: URL: https://github.com/apache/lucene/pull/13635 ### Description This PR addresses #13634. The main changes are in: - `AbstractKnnVectorQuery`, which adds a `seed` field. It scores this query if provided, and passes these see

Re: [I] Seeding HNSW Search [lucene]

2024-08-06 Thread via GitHub
benwtrent commented on issue #13634: URL: https://github.com/apache/lucene/issues/13634#issuecomment-2271052423 @seanmacavaney I like this idea (I remember reading this paper a while back and getting excited about it). A couple of concerns I have are: - The API, this is alway

[I] Seeding HNSW Search [lucene]

2024-08-06 Thread via GitHub
seanmacavaney opened a new issue, #13634: URL: https://github.com/apache/lucene/issues/13634 ### Description In some vector search cases, users may already know some documents that are likely related to a query. Let's support seeding HNSW's scoring stage with these documents, rather

Re: [PR] CandidateMatcher public matching functions [lucene]

2024-08-06 Thread via GitHub
romseygeek commented on code in PR #13632: URL: https://github.com/apache/lucene/pull/13632#discussion_r1705166098 ## lucene/monitor/src/test/org/apache/lucene/monitor/outsidepackage/TestCandidateMatcherVisibility.java: ## @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [KNN] Add comment and remove duplicate code [lucene]

2024-08-06 Thread via GitHub
dungba88 commented on code in PR #13594: URL: https://github.com/apache/lucene/pull/13594#discussion_r1705038897 ## lucene/core/src/java/org/apache/lucene/search/KnnQueryUtils.java: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +