krickert commented on issue #12313: URL: https://github.com/apache/lucene/issues/12313#issuecomment-2105745361
I was thinking about this and thought this would be cool with a few different use cases for a multi-valued vector: 1. The multi-values are treated the same as the single value, except once it's found to be a nearest K, it won't repeat. For example: Doc A has vectors A1, A2, and A3. Doc2 has vectors B1 bad B2. Then we have a Doc3 with C1. A vector search is performed, and the K'th nearest return: A1 A2 C1 B2 B1 A3 In one scenerio, the search results would be the same as above, and the docs would repeat. In another scenario, the results would just return the top doc and not repeat it. So a KNN result would be: Doc1 (A1 won) Doc3 (C1 won) Doc2 (B2 won) ... In another option, we can look into indexing the vectors where we get an average, min, or max between each dimension and just index the avg, min, or max. For some reason, I think this might be a bit weird since you can do these calculations at index time. But just a thought... Are any of the suggestions similar to what I'm suggesting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org