benwtrent commented on PR #12434:
URL: https://github.com/apache/lucene/pull/12434#issuecomment-1651725953

   @msokolov && @alessandrobenedetti pinging y'all as you will probably be the 
most interested in this change.
   
   @alessandrobenedetti the original design did take some inspiration from your 
multi-value vector work. However, benchmarking & testing required significant 
changes. For deduplicating parent docIds during search, the hashMap is now part 
of the queue instead of iterating a cache outside  the heap. This improved 
performance significantly.
   
   I would say this is how folks should represent multi-valued vectors when 
they require access to the matching passage or additional metadata. Otherwise, 
deep changes are required in the codec to attach arbitrary metadata to the 
vectors themselves, which seems like overkill to me when we already have `join`.
   
   This does not obviate the need for "true" multi-value vector support (e.g. 
for late-interaction models, or multi-value vectors that don't require 
metadata). This does lay some nice groundwork that can improve that 
implementation (a custom collector that can deduplicate vectors to a docId 
while searching).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to