[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

Michael Sokolov (Jira) Mon, 26 Oct 2020 08:25:45 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220770#comment-17220770
 ]


Michael Sokolov commented on LUCENE-9583:
-----------------------------------------

We could maybe move {{VectorValues.RandomAccess}} and the 
{{VectorValues.randomAccess()}} method to a standalone interface: 
{{RandomAccessVector}} or so (maybe we'd need two interfaces - one for the RA 
interface itself and another for producers of it. This standalone interface 
could even maybe live in codecs to make it seem more internal/expert, although 
it would maybe be weird to put it there? I'm nbot totally clear on the split 
between index and codecs. At least if we did this it would no longer jump out 
at you as part of VectorValues, although it would have to be public (unless we 
also moved stuff from VectorValuesWriter to codecs, then we could make it 
package private) . Then we could have the existing implementations in 
codecs/lucene90 implement this interface and use typecasts to get access to it.

> How should we expose VectorValues.RandomAccess?
> -----------------------------------------------
>
>                 Key: LUCENE-9583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9583
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael Sokolov
>            Priority: Major
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
> sub-interface. [~jtibshirani] pointed out this is not needed by some 
> vector-indexing strategies which can operate solely using a forward-iterator 
> (it is needed by HNSW), and so in the interest of simplifying the public API 
> we should not expose this internal detail (which by the way surfaces internal 
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered 
> that we do also currently make use of the RA API when merging vector fields 
> over sorted indexes. Without it, we would need to load all vectors into RAM  
> while flushing/merging, as we currently do in 
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN 
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} 
>  to {{VectorValues.RandomAccess}}. This I think we could move back, and 
> handle the HNSW requirements for search elsewhere. I wonder if that would 
> alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

Reply via email to