Re: [PR] Add a Better Binary Quantizer (RaBitQ) format for dense vectors [lucene]

2024-10-21 Thread via GitHub
benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2427907718 Here is some Lucene Util Benchmarking. Some of these numbers actually contradict some of my previous benchmarking for int4. Which is frustrating, I wonder what I did wrong then or now.

Re: [PR] Add BaseKnnVectorsFormatTestCase.testRecall() and fix old codecs [lucene]

2024-10-21 Thread via GitHub
msokolov commented on PR #13910: URL: https://github.com/apache/lucene/pull/13910#issuecomment-2427329494 > Since Lucene90 didn't support sparse vector values, I am not sure this is strictly necessary. But I can understand it from a consistency standpoint. After reflection, I don't th

Re: [PR] Include java21 source folders to gradle source sets [lucene]

2024-10-21 Thread via GitHub
dweiss commented on PR #13926: URL: https://github.com/apache/lucene/pull/13926#issuecomment-2427318916 Also - this basically adds syntax highlighting and suggestions, forget about running tests with these classes - I don't think it'll work from the IDE. -- This is an automated message fr

Re: [PR] Include java21 source folders to gradle source sets [lucene]

2024-10-21 Thread via GitHub
dweiss commented on PR #13926: URL: https://github.com/apache/lucene/pull/13926#issuecomment-2427276085 It is complicated also because there is some trickery in how Lucene compiles against those preview APIs - we don't use the preview option but instead fool the compiler into thinking these

Re: [PR] Fix StoredFieldsConsumer finish [lucene]

2024-10-21 Thread via GitHub
jpountz merged PR #13927: URL: https://github.com/apache/lucene/pull/13927 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Move BooleanScorer to work on top of Scorers rather than BulkScorers. [lucene]

2024-10-21 Thread via GitHub
jpountz merged PR #13931: URL: https://github.com/apache/lucene/pull/13931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Speedup OrderIntervalsSource some more [lucene]

2024-10-21 Thread via GitHub
original-brownbear commented on PR #13937: URL: https://github.com/apache/lucene/pull/13937#issuecomment-2426878304 Thanks Adrien! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Speedup OrderIntervalsSource some more [lucene]

2024-10-21 Thread via GitHub
original-brownbear merged PR #13937: URL: https://github.com/apache/lucene/pull/13937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...

Re: [PR] Fix StoredFieldsConsumer finish [lucene]

2024-10-21 Thread via GitHub
linfn commented on PR #13927: URL: https://github.com/apache/lucene/pull/13927#issuecomment-2426869427 @jpountz Done. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Speedup PriorityQueue a little [lucene]

2024-10-21 Thread via GitHub
mikemccand commented on code in PR #13936: URL: https://github.com/apache/lucene/pull/13936#discussion_r1808871571 ## lucene/core/src/java/org/apache/lucene/util/PriorityQueue.java: ## @@ -117,7 +117,8 @@ public PriorityQueue(int maxSize, Supplier sentinelObjectSupplier) {

Re: [PR] Speedup OrderIntervalsSource some more [lucene]

2024-10-21 Thread via GitHub
original-brownbear commented on code in PR #13937: URL: https://github.com/apache/lucene/pull/13937#discussion_r1808845516 ## lucene/queries/src/java/org/apache/lucene/queries/intervals/OrderedIntervalsSource.java: ## @@ -161,8 +163,8 @@ public int nextInterval() throws IOExcept

Re: [PR] Use RandomAccessInput instead of seeking in Lucene90DocValuesProducer [lucene]

2024-10-21 Thread via GitHub
original-brownbear commented on PR #13894: URL: https://github.com/apache/lucene/pull/13894#issuecomment-2426662403 I think we simply underestimate the variance luceneutil which results in p-values that are too low. See https://github.com/mikemccand/luceneutil/pull/308 for a suggested fix.

Re: [PR] Simplify PForUtil construction and cleanup its code gen a little [lucene]

2024-10-21 Thread via GitHub
original-brownbear merged PR #13932: URL: https://github.com/apache/lucene/pull/13932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...

Re: [PR] Simplify PForUtil construction and cleanup its code gen a little [lucene]

2024-10-21 Thread via GitHub
original-brownbear commented on PR #13932: URL: https://github.com/apache/lucene/pull/13932#issuecomment-2426636569 That said :) thanks Adrien, merging :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Simplify PForUtil construction and cleanup its code gen a little [lucene]

2024-10-21 Thread via GitHub
original-brownbear commented on PR #13932: URL: https://github.com/apache/lucene/pull/13932#issuecomment-2426635537 W're also too confident in results I think: https://github.com/mikemccand/luceneutil/pull/308 -- This is an automated message from the Apache Git Service. To respond to the

[I] Look into ACORN-1, or another algorithm to aid in filtered HNSW search [lucene]

2024-10-21 Thread via GitHub
benwtrent opened a new issue, #13940: URL: https://github.com/apache/lucene/issues/13940 ### Description Lucene already does OK in filtered kNN search, but it can be better. An interesting paper in this area: https://arxiv.org/abs/2403.04871 Weaviate has done an implemen

Re: [PR] Reduce the compiled size of the collect() method on `TopScoreDocCollector`. [lucene]

2024-10-21 Thread via GitHub
jpountz commented on PR #13939: URL: https://github.com/apache/lucene/pull/13939#issuecomment-2426505269 For reference, luceneutil shows no difference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Simplify PForUtil construction and cleanup its code gen a little [lucene]

2024-10-21 Thread via GitHub
original-brownbear commented on PR #13932: URL: https://github.com/apache/lucene/pull/13932#issuecomment-2426308076 @jpountz 🤦‍♂️ I noticed this too but kept attributing this to CPU savings helping JIT and the like on my weaker benchmark box (feeling really clever about myself) ... but now

Re: [PR] Check ahead if we can get the count [lucene]

2024-10-21 Thread via GitHub
LuXugang commented on PR #13899: URL: https://github.com/apache/lucene/pull/13899#issuecomment-2426154458 > The logic makes sense to me but it's a bit hard to read, could we avoid touching `getDocIdSetIteratorOrNull` and only have new logic in the `Weight#count` impl? Thank you for y

Re: [PR] Include java21 source folders to gradle source sets [lucene]

2024-10-21 Thread via GitHub
javanna commented on PR #13926: URL: https://github.com/apache/lucene/pull/13926#issuecomment-2426005143 Yes @dweiss indeed it's complicated. I have tried manually and I did not succeed yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] Reduce the compiled size of the collect() method on `TopScoreDocCollector`. [lucene]

2024-10-21 Thread via GitHub
jpountz opened a new pull request, #13939: URL: https://github.com/apache/lucene/pull/13939 This comes from observations on https://tantivy-search.github.io/bench/ for exhaustive evaluation like `TOP_100_COUNT`. `collect()` is often inlined, but other methods that we'd like to see inlined l

Re: [PR] Move BooleanScorer to work on top of Scorers rather than BulkScorers. [lucene]

2024-10-21 Thread via GitHub
jpountz commented on PR #13931: URL: https://github.com/apache/lucene/pull/13931#issuecomment-2425831201 I could confirm the speedup on a different machine: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-val

[I] Move vector search from IndexInput to RandomAccessInput [lucene]

2024-10-21 Thread via GitHub
jpountz opened a new issue, #13938: URL: https://github.com/apache/lucene/issues/13938 ### Description Vector search currently loads vectors from disk by issuing a `seek()` followed by a `readFloats()`. We should instead: - Add an absolute `readFloats()` method to `RandomAccessInp