Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-12-04 Thread via GitHub
github-actions[bot] commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2518829338 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Revert "Ensure Panama float vector distance impls inlinable " [lucene]

2024-12-04 Thread via GitHub
rmuir merged PR #14041: URL: https://github.com/apache/lucene/pull/14041 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Revert "Ensure Panama float vector distance impls inlinable " [lucene]

2024-12-04 Thread via GitHub
ChrisHegarty commented on PR #14041: URL: https://github.com/apache/lucene/pull/14041#issuecomment-2518520684 Thanks @rmuir -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Reduce allocation rate in HNSW concurrent merge (backport of #14011) [lucene]

2024-12-04 Thread via GitHub
msokolov merged PR #14037: URL: https://github.com/apache/lucene/pull/14037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Reduce specialization in TopScoreDocCollector. [lucene]

2024-12-04 Thread via GitHub
javanna commented on PR #14038: URL: https://github.com/apache/lucene/pull/14038#issuecomment-2518446747 ++ thanks for simplifying this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[PR] Revert "Ensure Panama float vector distance impls inlinable " [lucene]

2024-12-04 Thread via GitHub
rmuir opened a new pull request, #14041: URL: https://github.com/apache/lucene/pull/14041 Reverts apache/lucene#14031 Reverting because of query results in nightly benchmark. Let's figure out separately what is happening. We may need to test on AMD cpu (i have some, and so does AWS).

Re: [PR] Ensure Panama float vector distance impls inlinable [lucene]

2024-12-04 Thread via GitHub
rmuir commented on PR #14031: URL: https://github.com/apache/lucene/pull/14031#issuecomment-2518404072 @jpountz lets just revert it and figure it out separately? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Support multi-tenant RAM buffers for IndexWriter [lucene]

2024-12-04 Thread via GitHub
mdmarshmallow commented on PR #13951: URL: https://github.com/apache/lucene/pull/13951#issuecomment-2518367806 Just ran some benchmarks from `luceneutil` and saw a pretty significant slow down in indexing throughput (21 GB/hour -> 16 GB/hour)... trying to figure out why -- This is an aut

Re: [PR] Ensure Panama float vector distance impls inlinable [lucene]

2024-12-04 Thread via GitHub
jpountz commented on PR #14031: URL: https://github.com/apache/lucene/pull/14031#issuecomment-2518122848 FYI nightly benchmarks had a big regression last night, and this is the only change I can find that could have caused this: https://benchmarks.mikemccandless.com/VectorSearch.html. --

Re: [PR] Introduce a BulkScorer for DisjunctionMaxQuery. [lucene]

2024-12-04 Thread via GitHub
jpountz commented on PR #14040: URL: https://github.com/apache/lucene/pull/14040#issuecomment-2518113983 This is already covered test-wise by existing tests, and `QueryUtils` checks in particular, which compare hits of the scorer and the bulk scorer. Here are benchmark results on wikibigall

[PR] Introduce a BulkScorer for DisjunctionMaxQuery. [lucene]

2024-12-04 Thread via GitHub
jpountz opened a new pull request, #14040: URL: https://github.com/apache/lucene/pull/14040 This introduces a bulk scorer for `DisjunctionMaxQuery` that delegates to the bulk scorers of the query clauses. This helps make the performance of top-level `DisjunctionMaxQuery` better, especially

Re: [PR] Remove scoreAll() optimization from DefaultBulkScorer. [lucene]

2024-12-04 Thread via GitHub
jpountz commented on PR #14039: URL: https://github.com/apache/lucene/pull/14039#issuecomment-2517858098 ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value AndStopWords 32.62 (2.

[PR] Remove scoreAll() optimization from DefaultBulkScorer. [lucene]

2024-12-04 Thread via GitHub
jpountz opened a new pull request, #14039: URL: https://github.com/apache/lucene/pull/14039 I cannot see benefits from this optimization anymore when running luceneutil. However, I do see some benefits from specializing cases when the collector produces a competitive iterator or when the sc

Re: [I] Move vector search from IndexInput to RandomAccessInput [lucene]

2024-12-04 Thread via GitHub
jpountz commented on issue #13938: URL: https://github.com/apache/lucene/issues/13938#issuecomment-2517680513 I was thinking of it differently, that `IndexInput` is for sequential reading (possibly with skipping, like we do in postings) while `RandomAccessInput` is for random access like we

Re: [PR] Combine all postings enum impls of the default codec into a single class [lucene]

2024-12-04 Thread via GitHub
jpountz merged PR #14033: URL: https://github.com/apache/lucene/pull/14033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Move vector search from IndexInput to RandomAccessInput [lucene]

2024-12-04 Thread via GitHub
rmuir commented on issue #13938: URL: https://github.com/apache/lucene/issues/13938#issuecomment-2517601394 @jpountz is this really appropriate? RandomAccessInput is to reduce the overhead when doing tiny (not bulk) reads, it was added to help move from fieldcache to docvalues, where you ne

Re: [I] Can we store only quantized vectors to reduce disk footprint? [lucene]

2024-12-04 Thread via GitHub
mikemccand commented on issue #14007: URL: https://github.com/apache/lucene/issues/14007#issuecomment-2517303394 Do we have a separate issue open already for half-floats? It seems like it deserves its own spinoff issue ... -- This is an automated message from the Apache Git Service. To r

Re: [I] Should we add bfloat16 support for HNSW? [lucene]

2024-12-04 Thread via GitHub
rmuir commented on issue #12403: URL: https://github.com/apache/lucene/issues/12403#issuecomment-2517431221 vector api still doesnt support it yet in openjdk `main` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Reduce specialization in TopScoreDocCollector. [lucene]

2024-12-04 Thread via GitHub
jpountz merged PR #14038: URL: https://github.com/apache/lucene/pull/14038 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Improve search equivalence tests. [lucene]

2024-12-04 Thread via GitHub
jpountz merged PR #14036: URL: https://github.com/apache/lucene/pull/14036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] Should we add bfloat16 support for HNSW? [lucene]

2024-12-04 Thread via GitHub
benwtrent commented on issue #12403: URL: https://github.com/apache/lucene/issues/12403#issuecomment-2517401712 I wonder if now that main requires jdk 21, if its worth it now? I would have to dig around to see if there are fast intrinsic decoding/encoding now for storing short floats. But w

Re: [I] Can we store only quantized vectors to reduce disk footprint? [lucene]

2024-12-04 Thread via GitHub
benwtrent commented on issue #14007: URL: https://github.com/apache/lucene/issues/14007#issuecomment-2517370159 @mikemccand https://github.com/apache/lucene/issues/12403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Can we store only quantized vectors to reduce disk footprint? [lucene]

2024-12-04 Thread via GitHub
mikemccand commented on issue #14007: URL: https://github.com/apache/lucene/issues/14007#issuecomment-2517267677 > I do think we should consider adding support for half-floats. +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Add support for storing term vectors in FeatureField [lucene]

2024-12-04 Thread via GitHub
jimczi merged PR #14034: URL: https://github.com/apache/lucene/pull/14034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Better encapsulate locking logic in HnswGraphBuilder [lucene]

2024-12-04 Thread via GitHub
viliam-durina commented on PR #14016: URL: https://github.com/apache/lucene/pull/14016#issuecomment-2516969696 >Won't it change the execution plan used during indexing? As far as I'm aware, it doesn't. >Or -- maybe it would actually be a no-op because the IndexWriter creates a

Re: [PR] Fix changelog for GITHUB#14011 [lucene]

2024-12-04 Thread via GitHub
viliam-durina commented on PR #14018: URL: https://github.com/apache/lucene/pull/14018#issuecomment-2516877489 Backport ready in #14037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[PR] Improve search equivalence tests. [lucene]

2024-12-04 Thread via GitHub
jpountz opened a new pull request, #14036: URL: https://github.com/apache/lucene/pull/14036 This addresses an existing TODO about giving terms a zipfian distribution, and disables query caching to make sure that two-phase iterators are properly tested. -- This is an automated message fro