krickert commented on PR #13525:
URL: https://github.com/apache/lucene/pull/13525#issuecomment-2489546204

   > Chunk-Based Highlighting – Interesting. With getAllVectorValues(), we can 
find all vector values with similarity above a separate sim-threshold for 
highlights?
   
   Not sure.  But it is frustrating for me: we only calculate K chunks and not 
N documents.  I want to return N documents all the time, and keep running K 
until N is reached.  Since it runs K on the chunks, I'd rather it return all 
thee chunks that it can until it reaches N amount of documents.  Then we can 
return the chunks that match which can be used by highlighting.
   
   >  I think this one plays better with a separate child doc per vector value. 
We can store these tags and access related data as separate fields in child 
docs and filter on them during search.
   
   Indexing the child docs requires making more docs.  We just care about the 
resulting embedding, so why not treat it like a tensor instead of an entire 
document?  It's frustrating to always make a child doc for multiple vectors 
when I can just do a keyword-value style instead.  Also, there's def some 
limitations with how you can use it with scoring and the query ends up looking 
like a mess.  If we can simplify the query syntax that would help a lot.
   
   If you can get a unit test going for your PR, I'd be glad to expand on it 
and play with it a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to