Re: [PR] Compute facets while collecting [lucene]

2024-08-10 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712610080 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/misc/LongValueFacetCutter.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF

Re: [PR] Compute facets while collecting [lucene]

2024-08-10 Thread via GitHub
epotyom commented on code in PR #13568: URL: https://github.com/apache/lucene/pull/13568#discussion_r1712617910 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/cutters/ranges/LongRangeFacetCutter.java: ## @@ -0,0 +1,413 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-10 Thread via GitHub
uschindler commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1712624281 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-10 Thread via GitHub
uschindler commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1712624281 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Optimize decoding blocks of postings using the vector API. [lucene]

2024-08-10 Thread via GitHub
uschindler commented on code in PR #13636: URL: https://github.com/apache/lucene/pull/13636#discussion_r1712624281 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/MemorySegmentPostingDecodingUtil.java: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Softwa

Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]

2024-08-10 Thread via GitHub
msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2282182975 Thinking about the implementation a bit I realized that when we reorder the vector storage for the benefit of HNSW we will still need a way to iterate over vector values in docid o

Re: [I] Lucene99FlatVectorsReader.getFloatVectorValues(): NPE: Cannot read field "vectorEncoding" because "fieldEntry" is null [lucene]

2024-08-10 Thread via GitHub
msokolov commented on issue #13626: URL: https://github.com/apache/lucene/issues/13626#issuecomment-2282191793 It makes sense to me to add null checks. I don't think PerFieldCodec should be required? The usage seems legit to me - basically it defines a set of params to use with all knn fiel

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-08-10 Thread via GitHub
msokolov commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2282193063 > It is possible to inappropriately use the knn formats and attempt to merge segments with mismatched field names. I don't understand this - what's inappropriate about it? I guess

Re: [I] testMergeStability failing for Knn formats [lucene]

2024-08-10 Thread via GitHub
msokolov commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2282193339 hmm thanks I'll take a look soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] testMergeStability failing for Knn formats [lucene]

2024-08-10 Thread via GitHub
msokolov commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2282198464 I didn't know about this constraint until now. Basically what happens is during merge we check for disconnected components and attempt to *add* connections to connect them. So it m

[PR] Two fixes for recently-added HnswGraphBuilder.connectComponents: [lucene]

2024-08-10 Thread via GitHub
msokolov opened a new pull request, #13642: URL: https://github.com/apache/lucene/pull/13642 1. properly set frozen flag to avoid re-duplicative work 2. don't try to join a node to itself -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Two fixes for recently-added HnswGraphBuilder.connectComponents: [lucene]

2024-08-10 Thread via GitHub
msokolov commented on PR #13642: URL: https://github.com/apache/lucene/pull/13642#issuecomment-2282219044 addresses #13640 with this fix the failing test seed on that issue succeeded. I re-ran the test with tests.iters=1000 and didn't get any fails -- This is an automated message

Re: [I] testMergeStability failing for Knn formats [lucene]

2024-08-10 Thread via GitHub
msokolov commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2282219274 OK, I guess we would need to actually build a graph when merging a single segment in case there are deletions. In any case it would be nice if the graph reconnection were stable. T

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-08-10 Thread via GitHub
benwtrent commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2282231690 @msokolov maybe? Looking at the code, this has never worked this way. Field entry existence was just assumed to be handled by the perfield codec. I tried digging into the

Re: [PR] Two fixes for recently-added HnswGraphBuilder.connectComponents: [lucene]

2024-08-10 Thread via GitHub
msokolov merged PR #13642: URL: https://github.com/apache/lucene/pull/13642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap