[ https://issues.apache.org/jira/browse/LUCENE-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315067#comment-17315067 ]
Michael Sokolov commented on LUCENE-9855: ----------------------------------------- OK, naming is hard! I think it will help to break down all the classes we are (or might be) talking about here. These are the bulk of the classes/packages added as part of this vector/knn search effort: {code:java} o.a.l.codecs: VectorFormat, VectorReader, VectorWriter o.a.l.codecs.lucene90: Lucene90VectorFormat, Lucene90VectorReader, Lucene90VectorWriter o.a.l.index: VectorValues, VectorValuesWriter, RandomAccessVectorValues, RandomAccessVectorValuesProducer o.a.l.search: o.a.l.util.hnsw: HnswGraph, HnswGrahPbuilder, NeighborQueue, NeighborArray, BoundsChecker {code} I think the scope of this issue is basically – consider a more specific name for these vector apis (that isn't so easily confused with TermVectors), and use plural form. Then we got into a discussion of whether this format is hnsw-only, but [~julietibs] points out that (a) we already decided it would handle multiple ANN algos, and (b) we can have algorithm-specific names in the implementation classes (the ones in o.a.l.codecs.lucene90 + any associated utility classes) without needing to make that change anywhere else (at the interface level). [~rcmuir] also raised some other issues; one performance-related, another we should have this strategy pattern at all. I might have missed something else? I think those are separate issues though: Robert please feel free to open some other JIRA if you think we ought to pursue further? Given that, I think we are talking here about the names of: {code:java} o.a.l.codecs: VectorFormat, VectorReader, VectorWriter o.a.l.index: VectorValues, VectorValuesWriter, RandomAccessVectorValues, RandomAccessVectorValuesProducer {code} We seem to be evolving some consensus around {{NumericVectors}}. I think if we are going to have a plural root like that, it makes no sense to add {{Values}} after it (NumericVectorsValues?), and the "values" name was really just copied from DocValues - it's not adding anything I think. I'd like to just change "VectorValues" to "NumericVectors" and "Vector" to "NumericVectors" but this leaves to {{NumericVectorsWriter}} classes in different packages. Maybe we coulkd adopt the DocValues Producer/Consumer naming in the codecs package with this result: {code:java} o.a.l.codecs: NumericVectorsFormat, NumericVectorsProducer, NumericVectorsConsumer o.a.l.index: NumericVectors, NumericVectorsWriter, RandomAccessNumericVectors, RandomAccessNumericVectorsSupplier {code} > Reconsider codec name VectorFormat > ---------------------------------- > > Key: LUCENE-9855 > URL: https://issues.apache.org/jira/browse/LUCENE-9855 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Affects Versions: main (9.0) > Reporter: Tomoko Uchida > Assignee: Tomoko Uchida > Priority: Blocker > > There is some discussion about the codec name for ann search. > https://lists.apache.org/thread.html/r3a6fa29810a1e85779de72562169e72d927d5a5dd2f9ea97705b8b2e%40%3Cdev.lucene.apache.org%3E > Main points here are 1) use plural form for consistency, and 2) use more > specific name for ann search (second point could be optional). > A few alternatives were proposed: > - VectorsFormat > - VectorValuesFormat > - NeighborsFormat > - DenseVectorsFormat -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org