[
https://issues.apache.org/jira/browse/LUCENE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315067#comment-17315067
]
Michael Sokolov commented on LUCENE-9855:
-----------------------------------------
OK, naming is hard! I think it will help to break down all the classes we are
(or might be) talking about here. These are the bulk of the classes/packages
added as part of this vector/knn search effort:
{code:java}
o.a.l.codecs: VectorFormat, VectorReader, VectorWriter
o.a.l.codecs.lucene90: Lucene90VectorFormat, Lucene90VectorReader,
Lucene90VectorWriter
o.a.l.index: VectorValues, VectorValuesWriter, RandomAccessVectorValues,
RandomAccessVectorValuesProducer
o.a.l.search:
o.a.l.util.hnsw: HnswGraph, HnswGrahPbuilder, NeighborQueue, NeighborArray,
BoundsChecker
{code}
I think the scope of this issue is basically – consider a more specific name
for these vector apis (that isn't so easily confused with TermVectors), and use
plural form.
Then we got into a discussion of whether this format is hnsw-only, but
[~julietibs] points out that (a) we already decided it would handle multiple
ANN algos, and (b) we can have algorithm-specific names in the implementation
classes (the ones in o.a.l.codecs.lucene90 + any associated utility classes)
without needing to make that change anywhere else (at the interface level).
[~rcmuir] also raised some other issues; one performance-related, another we
should have this strategy pattern at all. I might have missed something else? I
think those are separate issues though: Robert please feel free to open some
other JIRA if you think we ought to pursue further?
Given that, I think we are talking here about the names of:
{code:java}
o.a.l.codecs: VectorFormat, VectorReader, VectorWriter
o.a.l.index: VectorValues, VectorValuesWriter, RandomAccessVectorValues,
RandomAccessVectorValuesProducer
{code}
We seem to be evolving some consensus around {{NumericVectors}}. I think if we
are going to have a plural root like that, it makes no sense to add {{Values}}
after it (NumericVectorsValues?), and the "values" name was really just copied
from DocValues - it's not adding anything I think. I'd like to just change
"VectorValues" to "NumericVectors" and "Vector" to "NumericVectors" but this
leaves to {{NumericVectorsWriter}} classes in different packages. Maybe we
coulkd adopt the DocValues Producer/Consumer naming in the codecs package with
this result:
{code:java}
o.a.l.codecs: NumericVectorsFormat, NumericVectorsProducer,
NumericVectorsConsumer
o.a.l.index: NumericVectors, NumericVectorsWriter,
RandomAccessNumericVectors, RandomAccessNumericVectorsSupplier
{code}
> Reconsider codec name VectorFormat
> ----------------------------------
>
> Key: LUCENE-9855
> URL: https://issues.apache.org/jira/browse/LUCENE-9855
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: main (9.0)
> Reporter: Tomoko Uchida
> Assignee: Tomoko Uchida
> Priority: Blocker
>
> There is some discussion about the codec name for ann search.
> https://lists.apache.org/thread.html/r3a6fa29810a1e85779de72562169e72d927d5a5dd2f9ea97705b8b2e%40%3Cdev.lucene.apache.org%3E
> Main points here are 1) use plural form for consistency, and 2) use more
> specific name for ann search (second point could be optional).
> A few alternatives were proposed:
> - VectorsFormat
> - VectorValuesFormat
> - NeighborsFormat
> - DenseVectorsFormat
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]