Re: [I] Instrument IndexOrDocValuesQuery to report on its decisions [lucene]
mikemccand commented on issue #13442: URL: https://github.com/apache/lucene/issues/13442#issuecomment-2143387731 +1 to keeping `Query` classes lean. A general framework on `IndexSearcher` sounds nice, but it's hard to generalize with just this one use case? Can we think of other queries/collectors that might also benefit from this? Maybe the exotic rewrite choices that `MultiTermQuery` subclasses make (rewrite as filter, rewrite to boolean disjucntion of `TermQuery`, ...)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Sparse index [lucene]
mikemccand commented on PR #13441: URL: https://github.com/apache/lucene/pull/13441#issuecomment-2143390482 > Wops sorry clicked a wrong butten in the UI. This is not ready yet. :) Nevertheless, very exciting to get a sudden sneak preview!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add new dynamic confidence interval configuration to scalar quantized format [lucene]
benwtrent merged PR #13445: URL: https://github.com/apache/lucene/pull/13445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] What does the Lucene community think about dimensionality reduction for vectors, and should it be something the library does internally (at merge time perhaps)? [lucene]
benwtrent commented on issue #13403: URL: https://github.com/apache/lucene/issues/13403#issuecomment-2143529235 I agree @mikemccand, we should not do any dim-reduction stuff until some threshold of vectors is reached. I am not 100% convinced this scales nicely, well, it would scale nicer than running PQ on tiny segments all the time :). We already have some expensive issues with HNSW merging (we are slowly working through fixing those). When segments are merged that have different PQ code-books, we have to re-calculate and re-quantize everything unless we can do something clever. Maybe it won't be too expensive and the cost will be worth it if we can build the HNSW graph with the PQ or maybe use the PQ information to boot strap the HNSW graph build to make it cheaper. I agree this is a worthy place for experimentation. As an aside, this "wait to build the index" thing could also be done for HNSW. Tiny segments with quick flushes probably shouldn't even build HNSW graphs. Instead, they should probably store the float vectors flat (or the scalar quantized vectors flat as scalar quantizing is effectively linear in runtime). Then when a threshold is reached (it could be small, something like 1k, 10k?), we create the HNSW graphs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Implement Weight#count for vector values in the FieldExistsQuery [lucene]
github-actions[bot] commented on PR #13322: URL: https://github.com/apache/lucene/pull/13322#issuecomment-2143642763 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org