[ 
https://issues.apache.org/jira/browse/LUCENE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313940#comment-17313940
 ] 

Michael Sokolov commented on LUCENE-9855:
-----------------------------------------

I think it will be helpful to consider how we would handle a different ANN 
implementation. Say LSH. In that case, we would no longer store this graph file 
(what is currently in .vex files). We would need to add files to store LSH's 
hash tables, and the metadata would change. Is it the same format? A variant of 
this format? The current conception is that we would make variations of this 
single vector format to handle multiple ANN algorithms. We currently only have 
one, so it doesn't look that way, but anyway with that background a generic 
name like VectorsFormat (or NumericVectorsFormat to distinguish from DocVectors 
etc) makes sense.

On the other hand if you think we would create a new Format to represent this 
different kind of data that we are storing to disk, which will have its own 
de/serialization code (even if some of it would be the same), then we should 
pick a name that incorporates the algorithm, and by the way also get rid of the 
whole concept of {{SearchStrategy}}.

I think this is the fundamental question here: one format, multiple ANN 
strategies, or one format per ANN strategy? I thought it had been sorted out in 
our earlier discussions, but not everybody may have been following that closely.

> Reconsider codec name VectorFormat
> ----------------------------------
>
>                 Key: LUCENE-9855
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9855
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: main (9.0)
>            Reporter: Tomoko Uchida
>            Priority: Blocker
>
> There is some discussion about the codec name for ann search.
> https://lists.apache.org/thread.html/r3a6fa29810a1e85779de72562169e72d927d5a5dd2f9ea97705b8b2e%40%3Cdev.lucene.apache.org%3E
> Main points here are 1) use plural form for consistency, and 2) use more 
> specific name for ann search (second point could be optional).
> A few alternatives were proposed:
> - VectorsFormat
> - VectorValuesFormat
> - NeighborsFormat
> - DenseVectorsFormat



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to