Re: [PR] expand TestSegmentToThreadMapping coverage w.r.t. (excess) documents per slice [lucene]

2024-07-11 Thread via GitHub
github-actions[bot] commented on PR #13508: URL: https://github.com/apache/lucene/pull/13508#issuecomment-2224171133 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Introduce new encoding of BPV 21 for DocIdsWriter used in BKD Tree [lucene]

2024-07-11 Thread via GitHub
github-actions[bot] commented on PR #13521: URL: https://github.com/apache/lucene/pull/13521#issuecomment-2224171037 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [I] Significant drop in recall for 8 bit Scalar Quantizer [lucene]

2024-07-11 Thread via GitHub
MilindShyani commented on issue #13519: URL: https://github.com/apache/lucene/issues/13519#issuecomment-2224005032 Hello! So I just did the calculation and I get the following 1) I get an innocuous sign discrepancy, i.e., the term `- SIGNED_CORRECTION * alpha` should instead

Re: [PR] gh-12627: HnswGraphBuilder connects disconnected HNSW graph components [lucene]

2024-07-11 Thread via GitHub
msokolov commented on PR #13566: URL: https://github.com/apache/lucene/pull/13566#issuecomment-2223828635 note: addresses https://github.com/apache/lucene/issues/12627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] gh-12627: HnswGraphBuilder connects disconnected HNSW graph components [lucene]

2024-07-11 Thread via GitHub
msokolov commented on PR #13566: URL: https://github.com/apache/lucene/pull/13566#issuecomment-2223669457 ... oh another TODO is whether or not to worry about graph levels > 0. I kind of think it's not needed since no matter which graph component you traverse on the higher level, you eventu

[PR] gh-12627: HnswGraphBuilder connects disconnected HNSW graph components [lucene]

2024-07-11 Thread via GitHub
msokolov opened a new pull request, #13566: URL: https://github.com/apache/lucene/pull/13566 This looks pretty good to me, as far as it goes, but I wanted to post early so people can poke holes / suggest improvements. Remaining TODO's I'm tracking are: * testing recall (might improve a b

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-07-11 Thread via GitHub
benwtrent commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2223362650 @jmazanec15 > I think we would need to refine not the top k but the top r*k and then reduce to k 100%, I wasn't clear. Yes, over all segments, we gather some approx

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-07-11 Thread via GitHub
jmazanec15 commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2223317735 > I think the API would be tricky, but I am all for this idea Yes agree, Ill think on this a little bit. Ill start with a PoC and go from there. > Whatever the desig

Re: [PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]

2024-07-11 Thread via GitHub
msokolov merged PR #13561: URL: https://github.com/apache/lucene/pull/13561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-07-11 Thread via GitHub
benwtrent commented on issue #13564: URL: https://github.com/apache/lucene/issues/13564#issuecomment-2223238648 I think the API would be tricky, but I am all for this idea. The ability to "approximately score" and then do second pass to "exact score" is useful for all sorts of levels of qua

Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]

2024-07-11 Thread via GitHub
magibney commented on PR #13555: URL: https://github.com/apache/lucene/pull/13555#issuecomment-2223189418 Thanks for the suggestions! Most of these are now addressed (with a couple of questions: 1. maybe still need chm.computeIfAbsent() loop, 2. which if any Arena methods to throw UOE).

Re: [PR] Add IntervalsSource for range and regexp queries [lucene]

2024-07-11 Thread via GitHub
romseygeek commented on code in PR #13562: URL: https://github.com/apache/lucene/pull/13562#discussion_r1674172561 ## lucene/queries/src/java/org/apache/lucene/queries/intervals/Intervals.java: ## @@ -206,6 +210,91 @@ public static IntervalsSource wildcard(BytesRef wildcard, in

Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]

2024-07-11 Thread via GitHub
magibney commented on code in PR #13555: URL: https://github.com/apache/lucene/pull/13555#discussion_r1674172898 ## lucene/core/src/java21/org/apache/lucene/store/GroupedArena.java: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

[I] Add refinement of quantized vector scores with fp distance calculations [lucene]

2024-07-11 Thread via GitHub
jmazanec15 opened a new issue, #13564: URL: https://github.com/apache/lucene/issues/13564 ### Description With quantization techniques that are compressing vectors in memory further and further, because of how much information is lost, recall is going to drop. However, with the curre

[PR] Add levels to DocValues skipper index [lucene]

2024-07-11 Thread via GitHub
iverase opened a new pull request, #13563: URL: https://github.com/apache/lucene/pull/13563 Currently the DocValues skipper index collects the stats every 4096 documents that allow implementors to used them to decide if they want to process those documents or they can be skipped. The idea o

Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]

2024-07-11 Thread via GitHub
magibney commented on code in PR #13555: URL: https://github.com/apache/lucene/pull/13555#discussion_r1674096237 ## lucene/core/src/java21/org/apache/lucene/store/GroupedArena.java: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] Feature/scalar quantized off heap scoring [lucene]

2024-07-11 Thread via GitHub
ChrisHegarty commented on PR #13497: URL: https://github.com/apache/lucene/pull/13497#issuecomment-905533 > Regardless that its already much slower for the int4 case on both jdk 21 & 22. @benwtrent I was not aware, lemme take a look. -- This is an automated message from the Apa

Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]

2024-07-11 Thread via GitHub
vsop-479 commented on PR #13557: URL: https://github.com/apache/lucene/pull/13557#issuecomment-409330 I think I could dig more by printing assembly code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]

2024-07-11 Thread via GitHub
uschindler commented on code in PR #13555: URL: https://github.com/apache/lucene/pull/13555#discussion_r1673613242 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -24,12 +24,15 @@ import java.nio.file.Path; import java.nio.file.Sta

Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]

2024-07-11 Thread via GitHub
uschindler commented on code in PR #13555: URL: https://github.com/apache/lucene/pull/13555#discussion_r1673613242 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -24,12 +24,15 @@ import java.nio.file.Path; import java.nio.file.Sta

Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]

2024-07-11 Thread via GitHub
ChrisHegarty commented on code in PR #13555: URL: https://github.com/apache/lucene/pull/13555#discussion_r1673550548 ## lucene/core/src/java21/org/apache/lucene/store/MemorySegmentIndexInputProvider.java: ## @@ -24,12 +24,15 @@ import java.nio.file.Path; import java.nio.file.S

Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]

2024-07-11 Thread via GitHub
ChrisHegarty commented on code in PR #13555: URL: https://github.com/apache/lucene/pull/13555#discussion_r1673543647 ## lucene/core/src/java21/org/apache/lucene/store/GroupedArena.java: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]

2024-07-11 Thread via GitHub
jpountz commented on PR #13557: URL: https://github.com/apache/lucene/pull/13557#issuecomment-207417 It looks like it doesn't make a difference? `HighTermDayOfYearSort` has a speedup with a low p-value, but then `HighTermMonthSort` has a slowdown with a low p-value as well, so I suspect