[PR] Make DenseConjunctionBulkScorer align scoring windows with #docIDRunEnd(). [lucene]

2025-03-25 Thread via GitHub
jpountz opened a new pull request, #14400: URL: https://github.com/apache/lucene/pull/14400 This improves the way how `DenseConjunctionBulkScorer` computes scoring windows by aligning the end of the window with the `#docIDRunEnd()` of its clauses, as long as it would result in a window that

Re: [PR] [Draft] Support Multi-Vector HNSW Search via Flat Vector Storage [lucene]

2025-03-25 Thread via GitHub
alessandrobenedetti commented on PR #14173: URL: https://github.com/apache/lucene/pull/14173#issuecomment-2751006045 > > do you confirm that, according to your knowledge, any relevant and active work toward multi-valued vectors in Lucene is effectively aggregated here? > > @alessandr

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-25 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2751059970 Nice! So... google java format has this option, at least in the cmd line version: ![Image](https://github.com/user-attachments/assets/ffb7ebe1-c495-4411-8e7b-f3d8b176aeb4)

Re: [PR] Add support for two-phase iterators to DenseConjunctionBulkScorer. [lucene]

2025-03-25 Thread via GitHub
jpountz merged PR #14359: URL: https://github.com/apache/lucene/pull/14359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] [Draft] Support Multi-Vector HNSW Search via Flat Vector Storage [lucene]

2025-03-25 Thread via GitHub
vigyasharma commented on PR #14173: URL: https://github.com/apache/lucene/pull/14173#issuecomment-2751872315 > Another option I was pondering is adding a new field type dedicated to multi-valued vectors. I tried this in my first stab at this issue (https://github.com/apache/lucene/pu

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-25 Thread via GitHub
thecoop commented on code in PR #14304: URL: https://github.com/apache/lucene/pull/14304#discussion_r2012314179 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultVectorUtilSupport.java: ## @@ -234,4 +234,79 @@ public static long int4BitDotProductImpl(byte[]

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-25 Thread via GitHub
gsmiller commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2751652120 +1 to this optimization. Love the idea! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-25 Thread via GitHub
jpountz commented on PR #14401: URL: https://github.com/apache/lucene/pull/14401#issuecomment-2751630823 @epotyom You may be interested in this, this allows computing aggregates in sub-linear time respective to the number of matching docs. -- This is an automated message from the Apache G

Re: [PR] Make DenseConjunctionBulkScorer align scoring windows with #docIDRunEnd(). [lucene]

2025-03-25 Thread via GitHub
gf2121 commented on code in PR #14400: URL: https://github.com/apache/lucene/pull/14400#discussion_r2012455948 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -171,37 +171,36 @@ private int scoreWindow( } } -if (acceptDoc

Re: [PR] Disable sort optimization when tracking all docs [lucene]

2025-03-25 Thread via GitHub
bugmakerr commented on PR #14395: URL: https://github.com/apache/lucene/pull/14395#issuecomment-2750983066 > The change looks correct to me. With recent changes to allow clauses that match all docs to remove themselves from a conjunction, it should be possible to achieve something simil

Re: [PR] Make DenseConjunctionBulkScorer align scoring windows with #docIDRunEnd(). [lucene]

2025-03-25 Thread via GitHub
jpountz commented on code in PR #14400: URL: https://github.com/apache/lucene/pull/14400#discussion_r2012300061 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -171,27 +171,30 @@ private int scoreWindow( } } -if (acceptDo

Re: [PR] Make DenseConjunctionBulkScorer align scoring windows with #docIDRunEnd(). [lucene]

2025-03-25 Thread via GitHub
jpountz commented on PR #14400: URL: https://github.com/apache/lucene/pull/14400#issuecomment-2751702590 Thank you. I believe that it only makes a difference when `max-min < WINDOW_SIZE`, where more clauses would now get evaluated, but simplicity is more important so I applied your suggesti

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-25 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2751070467 I think in the markdown case, the bug I saw was that it didn't treat `///` as javadoc but as an ordinary inline comment. But I can experiment with the option still. -- This is an a

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-25 Thread via GitHub
dweiss commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2751317728 https://github.com/google/google-java-format/issues/1193 > Disabling Javadoc formatting doesn't prevent either issue. So it seems it's broken entirely. Argh. -- This is an

[PR] Enable collectors to take advantage of pre-aggregated data. [lucene]

2025-03-25 Thread via GitHub
jpountz opened a new pull request, #14401: URL: https://github.com/apache/lucene/pull/14401 This introduces `LeafCollector#collectRange`, which is typically useful to take advantage of the pre-aggregated data exposed in `DocValuesSkipper`. At the moment, `DocValuesSkipper` only exposes per-

Re: [PR] Pack file pointers when merging BKD trees [lucene]

2025-03-25 Thread via GitHub
iverase merged PR #14393: URL: https://github.com/apache/lucene/pull/14393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Reduce memory usage when merging bkd trees [lucene]

2025-03-25 Thread via GitHub
iverase closed issue #14382: Reduce memory usage when merging bkd trees URL: https://github.com/apache/lucene/issues/14382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Reduce memory usage when merging bkd trees [lucene]

2025-03-25 Thread via GitHub
iverase commented on issue #14382: URL: https://github.com/apache/lucene/issues/14382#issuecomment-2750449881 We are using more dense data structures now, in particular for the OneDimensionBKDWriter. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-25 Thread via GitHub
vigyasharma commented on issue #14362: URL: https://github.com/apache/lucene/issues/14362#issuecomment-2752804917 Thanks @guojialiang92 . Is the plan here to support creating an IndexWriter with a supplied value of `counter`, say `N`, so that all it's commit generations are `>=N` i.e. `segm

Re: [PR] Speed up histogram collection in a similar way as disjunction counts. [lucene]

2025-03-25 Thread via GitHub
jpountz commented on PR #14273: URL: https://github.com/apache/lucene/pull/14273#issuecomment-2752610585 Quick update: we now have more queries that collect hits using `collect(DocIdStream)`, which makes this optimization more appealing. -- This is an automated message from the Apache Git

Re: [I] Can we use Panama Vector API for quantizing vectors? [lucene]

2025-03-25 Thread via GitHub
benwtrent closed issue #13922: Can we use Panama Vector API for quantizing vectors? URL: https://github.com/apache/lucene/issues/13922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Can we use Panama Vector API for quantizing vectors? [lucene]

2025-03-25 Thread via GitHub
benwtrent closed issue #13922: Can we use Panama Vector API for quantizing vectors? URL: https://github.com/apache/lucene/issues/13922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2025-03-25 Thread via GitHub
github-actions[bot] commented on PR #13782: URL: https://github.com/apache/lucene/pull/13782#issuecomment-2752825545 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Break the loop when segment is fully deleted by prior delTerms or delQueries [lucene]

2025-03-25 Thread via GitHub
github-actions[bot] commented on PR #13398: URL: https://github.com/apache/lucene/pull/13398#issuecomment-2752825874 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Make DenseConjunctionBulkScorer align scoring windows with #docIDRunEnd(). [lucene]

2025-03-25 Thread via GitHub
jpountz commented on code in PR #14400: URL: https://github.com/apache/lucene/pull/14400#discussion_r2012975372 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -171,37 +171,36 @@ private int scoreWindow( } } -if (acceptDo

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-03-25 Thread via GitHub
github-actions[bot] commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2752825072 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] knn search - add tests to perform exact search when filtering does not return enough results [lucene]

2025-03-25 Thread via GitHub
benwtrent merged PR #14274: URL: https://github.com/apache/lucene/pull/14274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[I] New testMinMaxScalarQuantize tests failing repeatably [lucene]

2025-03-25 Thread via GitHub
benwtrent opened a new issue, #14402: URL: https://github.com/apache/lucene/issues/14402 ### Description ``` TestVectorUtilSupport > testMinMaxScalarQuantize {p0=4096} FAILED java.lang.AssertionError: Expected: a numeric value within <0.004096> of <762.170654296875>

Re: [I] New testMinMaxScalarQuantize tests failing repeatably [lucene]

2025-03-25 Thread via GitHub
benwtrent commented on issue #14402: URL: https://github.com/apache/lucene/issues/14402#issuecomment-2752192712 @thecoop ping ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Make PointValues.intersect iterative instead of recursive [lucene]

2025-03-25 Thread via GitHub
jpountz commented on PR #14391: URL: https://github.com/apache/lucene/pull/14391#issuecomment-2752592897 Nightly benchmarks report a tiny slowdown for IntNRQ and CountFilteredIntNRQ (https://benchmarks.mikemccandless.com/2025.03.24.18.05.19.html) nevertheless I agree with your point that it

Re: [I] Use @snippet javadoc tag for snippets [lucene]

2025-03-25 Thread via GitHub
rmuir commented on issue #14257: URL: https://github.com/apache/lucene/issues/14257#issuecomment-2751009070 @dweiss thank you so much for that starter commit for evaluation. I will try it tonight and fire up eclipse and see what our options are. I finally finished parser (https://gith

Re: [PR] Make DenseConjunctionBulkScorer align scoring windows with #docIDRunEnd(). [lucene]

2025-03-25 Thread via GitHub
gf2121 commented on code in PR #14400: URL: https://github.com/apache/lucene/pull/14400#discussion_r2012151639 ## lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java: ## @@ -171,27 +171,30 @@ private int scoreWindow( } } -if (acceptDoc

Re: [PR] Create vectorized versions of ScalarQuantizer.quantize and recalculateCorrectiveOffset [lucene]

2025-03-25 Thread via GitHub
benwtrent commented on code in PR #14304: URL: https://github.com/apache/lucene/pull/14304#discussion_r201706 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -334,4 +334,45 @@ public static int findNextGEQ(int[] buffer, int target, int from, int to) {

Re: [I] Support modifying segmentInfos.counter in IndexWriter [lucene]

2025-03-25 Thread via GitHub
guojialiang92 commented on issue #14362: URL: https://github.com/apache/lucene/issues/14362#issuecomment-2753129754 Thanks @vigyasharma. Your understanding is correct (**This is specifically a problem for segment replication**). From an implementation point of view, similar to the cu

Re: [PR] Speed up advancing within a sparse block in IndexedDISI. [lucene]

2025-03-25 Thread via GitHub
vsop-479 commented on PR #14371: URL: https://github.com/apache/lucene/pull/14371#issuecomment-2753155562 > a bench in jmh will be great. I measured it with `AdvanceSparseDISIBenchmark`: Benchmark Mode CntScore Error

Re: [PR] Optimize slice calculation in IndexSearcher a little [lucene]

2025-03-25 Thread via GitHub
github-actions[bot] commented on PR #13860: URL: https://github.com/apache/lucene/pull/13860#issuecomment-2752825465 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Use FixedLengthBytesRefArray in OneDimensionBKDWriter to hold split values [lucene]

2025-03-25 Thread via GitHub
iverase merged PR #14383: URL: https://github.com/apache/lucene/pull/14383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa