uschindler commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-1612506692

   I would still prefer to have multiple values per document. From the point of 
view of implementation this does not look crazy to me, but using blockjoins 
adds too many limitations and often people don't want to use it for other 
reasons
   
   The implementation as suggested by @alessandrobenedetti looks great to me 
and goes in line with other multivalued fields in Lucene, just my comments 
after watching his talk and skimming through th PR:
   - the general storage implementation of the storage of vectors is basically 
similar to SortedSetDocValues (see also @msokolov initial implementation which 
solely used DocValues). The change here is SortedDocValues to 
SortedSetDocvalues. We may keep a separate single valued implementation and 
offer a wrapper (like for docvalues).
   - the index to find nearest neigbours (HNSW) does not need any change 
because the grpah entries just point to ordinal numbers. We just need to take 
care that the number of ordinal numbers may go beyond Integer.MAX_VALUE
   - result collection is different because we need to apply the min/max/avg 
functions. To me this is the most complicated change, but this would be 
similarily complex with block join.
   
   I think the biggest problem of the current PR is that ordinals need to be 
"long" as the number of vectors may go beyond Integer.MAX_VALUE.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to