msokolov commented on issue #14758: URL: https://github.com/apache/lucene/issues/14758#issuecomment-3319284619
I agree that use case (3) above is the target (user knowingly passes duplicate vector). Then it bothers me that we don't make use of the information that one of the fields is strictly a subset of the other (as it is filtered), and that we require callers to supply the same vectors twice while indexing, and then, ignoring the information, try to recreate it through the complexity of sorting vector data (and requiring codec changes to store vector data globally, sharing it in ways we didn't really anticipate, ie unrelated vector fields will now be stored together). I guess I'm still stuck on the "view" idea. What would be the problem with having one field reference another? I guess when creating a reader for the "view" field it would have to delegate to a flat vectors reader from the other field, and it would require an ordinal mapping (so that its graph can have dense ordinals while still referring to ordinals from the other field for value lookups), or else support graphs with non-dense ordinals. Aside from all that, does Lucene somehow have a guarantee that there are no inter-field dependencies? I can't think of any way it does. Its indexing is entirely document-centric; it enforces that the field definitions are immutable (so if field B depends on A and A's definition changes that could conceivably be a problem, but it doesn't arise). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org