Re: [I] Instrument IndexOrDocValuesQuery to report on its decisions [lucene]

2024-06-01 Thread via GitHub


mikemccand commented on issue #13442:
URL: https://github.com/apache/lucene/issues/13442#issuecomment-2143387731

   +1 to keeping `Query` classes lean.
   
   A general framework on `IndexSearcher` sounds nice, but it's hard to 
generalize with just this one use case?  Can we think of other 
queries/collectors that might also benefit from this?  Maybe the exotic rewrite 
choices that `MultiTermQuery` subclasses make (rewrite as filter, rewrite to 
boolean disjucntion of `TermQuery`, ...)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Sparse index [lucene]

2024-06-01 Thread via GitHub


mikemccand commented on PR #13441:
URL: https://github.com/apache/lucene/pull/13441#issuecomment-2143390482

   > Wops sorry clicked a wrong butten in the UI. This is not ready yet. :)
   
   Nevertheless, very exciting to get a sudden sneak preview!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add new dynamic confidence interval configuration to scalar quantized format [lucene]

2024-06-01 Thread via GitHub


benwtrent merged PR #13445:
URL: https://github.com/apache/lucene/pull/13445


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] What does the Lucene community think about dimensionality reduction for vectors, and should it be something the library does internally (at merge time perhaps)? [lucene]

2024-06-01 Thread via GitHub


benwtrent commented on issue #13403:
URL: https://github.com/apache/lucene/issues/13403#issuecomment-2143529235

   I agree @mikemccand, we should not do any dim-reduction stuff until some 
threshold of vectors is reached.
   
   I am not 100% convinced this scales nicely, well, it would scale nicer than 
running PQ on tiny segments all the time :). We already have some expensive 
issues with HNSW merging (we are slowly working through fixing those). 
   
   When segments are merged that have different PQ code-books, we have to 
re-calculate and re-quantize everything unless we can do something clever. 
   
   Maybe it won't be too expensive and the cost will be worth it if we can 
build the HNSW graph with the PQ or maybe use the PQ information to boot strap 
the HNSW graph build to make it cheaper.
   
   I agree this is a worthy place for experimentation.
   
   
   As an aside, this "wait to build the index" thing could also be done for 
HNSW. Tiny segments with quick flushes probably shouldn't even build HNSW 
graphs. Instead, they should probably store the float vectors flat (or the 
scalar quantized vectors flat as scalar quantizing is effectively linear in 
runtime). Then when a threshold is reached (it could be small, something like 
1k, 10k?), we create the HNSW graphs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Implement Weight#count for vector values in the FieldExistsQuery [lucene]

2024-06-01 Thread via GitHub


github-actions[bot] commented on PR #13322:
URL: https://github.com/apache/lucene/pull/13322#issuecomment-2143642763

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org