[ 
https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov updated LUCENE-9583:
------------------------------------
    Description: 
In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
sub-interface. [~jtibshirani] pointed out this is not needed by some 
vector-indexing strategies which can operate solely using a forward-iterator 
(it is needed by HNSW), and so in the interest of simplifying the public API we 
should not expose this internal detail (which by the way surfaces internal 
ordinals that are somewhat uninteresting outside the random access API).

I looked into how to move this inside the HNSW-specific code and remembered 
that we do also currently make use of the RA API when merging vector fields 
over sorted indexes. Without it, we would need to load all vectors into RAM  
while flushing/merging, as we currently do in 
{{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
for the simpler API.

Another thing I noticed while reviewing this is that I moved the KNN 
{{search(float[] target, int topK, int fanout)}} method from {{VectorValues}}  
to {{VectorValues.RandomAccess}}. This I think we could move back, and handle 
the HNSW requirements for search elsewhere. I wonder if that would alleviate 
the major concern here? 

  was:
In the newly-added VectorValues API, we have a RandomAccess sub-interface. 
[[~jtibshirani] pointed out this is not needed by some vector-indexing 
strategies which can operate solely using a forward-iterator (it is needed by 
HNSW), and so in the interest of simplifying the public API we should not 
expose this internal detail (which by the way surfaces internal ordinals that 
are somewhat uninteresting outside the random access API).

I looked into how to move this inside the HNSW-specific code and remembered 
that we do also currently make use of the RA API when merging vector fields 
over sorted indexes. Without it, we would need to load all vectors into RAM  
while flushing/merging, as we currently do in BinaryDocValuesWriter.BinaryDVs. 
I wonder if it's worth paying this cost for the simpler API.

Another thing I noticed while reviewing this is that I moved the KNN 
`search(float[] target, int topK, int fanout)` method from `VectorValues` to 
`VectorValues.RandomAccess`. This I think we could move back, and handle the 
HNSW requirements for search elsewhere. I wonder if that would alleviate the 
major concern here? 


> How should we expose VectorValues.RandomAccess?
> -----------------------------------------------
>
>                 Key: LUCENE-9583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9583
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael Sokolov
>            Priority: Major
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} 
> sub-interface. [~jtibshirani] pointed out this is not needed by some 
> vector-indexing strategies which can operate solely using a forward-iterator 
> (it is needed by HNSW), and so in the interest of simplifying the public API 
> we should not expose this internal detail (which by the way surfaces internal 
> ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered 
> that we do also currently make use of the RA API when merging vector fields 
> over sorted indexes. Without it, we would need to load all vectors into RAM  
> while flushing/merging, as we currently do in 
> {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost 
> for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN 
> {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}} 
>  to {{VectorValues.RandomAccess}}. This I think we could move back, and 
> handle the HNSW requirements for search elsewhere. I wonder if that would 
> alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to