krickert commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2489546204
> Chunk-Based Highlighting – Interesting. With getAllVectorValues(), we can find all vector values with similarity above a separate sim-threshold for highlights? Not sure. But it is frustrating for me: we only calculate K chunks and not N documents. I want to return N documents all the time, and keep running K until N is reached. Since it runs K on the chunks, I'd rather it return all thee chunks that it can until it reaches N amount of documents. Then we can return the chunks that match which can be used by highlighting. > I think this one plays better with a separate child doc per vector value. We can store these tags and access related data as separate fields in child docs and filter on them during search. Indexing the child docs requires making more docs. We just care about the resulting embedding, so why not treat it like a tensor instead of an entire document? It's frustrating to always make a child doc for multiple vectors when I can just do a keyword-value style instead. Also, there's def some limitations with how you can use it with scoring and the query ends up looking like a mess. If we can simplify the query syntax that would help a lot. If you can get a unit test going for your PR, I'd be glad to expand on it and play with it a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org