[ 
https://issues.apache.org/jira/browse/LUCENE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315067#comment-17315067
 ] 

Michael Sokolov commented on LUCENE-9855:
-----------------------------------------

OK, naming is hard! I think it will help to break down all the classes we are 
(or might be) talking about here. These are the bulk of the classes/packages 
added as part of this vector/knn search effort:

 
{code:java}
o.a.l.codecs: VectorFormat, VectorReader, VectorWriter
o.a.l.codecs.lucene90: Lucene90VectorFormat, Lucene90VectorReader, 
Lucene90VectorWriter
o.a.l.index: VectorValues,    VectorValuesWriter, RandomAccessVectorValues, 
RandomAccessVectorValuesProducer
o.a.l.search:
o.a.l.util.hnsw: HnswGraph, HnswGrahPbuilder, NeighborQueue, NeighborArray, 
BoundsChecker
{code}
 

I think the scope of this issue is basically – consider a more specific name 
for these vector apis (that isn't so easily confused with TermVectors), and use 
plural form.

Then we got into a discussion of whether this format is hnsw-only, but 
[~julietibs] points out that (a) we already decided it would handle multiple 
ANN algos, and (b) we can have algorithm-specific names in the implementation 
classes (the ones in o.a.l.codecs.lucene90 + any associated utility classes) 
without needing to make that change anywhere else (at the interface level).

[~rcmuir] also raised some other issues; one performance-related, another we 
should have this strategy pattern at all. I might have missed something else? I 
think those are separate issues though: Robert please feel free to open some 
other JIRA if you think we ought to pursue further?

Given that, I think we are talking here about the names of:

 
{code:java}
o.a.l.codecs: VectorFormat, VectorReader, VectorWriter
o.a.l.index: VectorValues,    VectorValuesWriter, RandomAccessVectorValues, 
RandomAccessVectorValuesProducer
{code}
We seem to be evolving some consensus around {{NumericVectors}}. I think if we 
are going to have a plural root like that, it makes no sense to add {{Values}} 
after it (NumericVectorsValues?), and the "values" name was really just copied 
from DocValues - it's not adding anything I think. I'd like to just change 
"VectorValues" to "NumericVectors" and "Vector" to "NumericVectors" but this 
leaves to {{NumericVectorsWriter}} classes in different packages. Maybe we 
coulkd adopt the DocValues Producer/Consumer naming in the codecs package with 
this result:
{code:java}
o.a.l.codecs: NumericVectorsFormat, NumericVectorsProducer, 
NumericVectorsConsumer
o.a.l.index: NumericVectors,    NumericVectorsWriter, 
RandomAccessNumericVectors, RandomAccessNumericVectorsSupplier
{code}
 

> Reconsider codec name VectorFormat
> ----------------------------------
>
>                 Key: LUCENE-9855
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9855
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: main (9.0)
>            Reporter: Tomoko Uchida
>            Assignee: Tomoko Uchida
>            Priority: Blocker
>
> There is some discussion about the codec name for ann search.
> https://lists.apache.org/thread.html/r3a6fa29810a1e85779de72562169e72d927d5a5dd2f9ea97705b8b2e%40%3Cdev.lucene.apache.org%3E
> Main points here are 1) use plural form for consistency, and 2) use more 
> specific name for ann search (second point could be optional).
> A few alternatives were proposed:
> - VectorsFormat
> - VectorValuesFormat
> - NeighborsFormat
> - DenseVectorsFormat



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to