Re: [I] [DISCUSS] Could we have a different ANN algorithm for Learned Sparse Vectors? [lucene]

via GitHub Wed, 21 Aug 2024 13:57:20 -0700


jpountz commented on issue #13675:
URL: https://github.com/apache/lucene/issues/13675#issuecomment-2303000074


   I found this recent paper by well-known people in the IR efficiency space 
quite interesting: https://arxiv.org/pdf/2405.01117. It builds on inverted 
indexes and simple/intuitive ideas:
    - BP reordering, that Ben alluded to and that Lucene already supports, it 
naturally clusters documents with similar terms together,
    - Block-max WAND, which Lucene supports,
    - Anytime ranking on document ordered indexes 
(https://arxiv.org/pdf/2104.08976), ie. ranking ranges of doc IDs that have the 
best impact scores first in order to optimize pruning. Something Lucene doesn't 
support at the moment but that look doable and generally useful.
    - Unsafe top-k search via termination conditions and skipping blocks that 
are barely more competitive than the current top-k-th hit.
    - Query term pruning, which sounds like a good idea in general for learned 
sparse retrieval when the model generates many terms.
    - Scoring high-frequency / low-scoring terms via a forward index instead of 
an inverted index.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] [DISCUSS] Could we have a different ANN algorithm for Learned Sparse Vectors? [lucene]

Reply via email to