[jira] [Comment Edited] (LUCENE-9674) Faster advance on Vector Values

Michael Sokolov (Jira) Sat, 23 Jan 2021 12:25:04 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-9674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270750#comment-17270750
 ]


Michael Sokolov edited comment on LUCENE-9674 at 1/23/21, 8:24 PM:
-------------------------------------------------------------------

Checking luceneutil, that does not surprise me. At some point I had added 
support for retrieval of vectors (as we had done previously for stored fields), 
which would in theory exercise this API. KNN search does not exercise it - it 
uses only the random access API, not the forward iterator (nextDoc) that was 
optimized here.

I think we had disabled this previously since turning it on would impact *all* 
tasks, not just the vector tasks. At the same time I see that the way this was 
implemented *also* uses the random access API, although it should not. My  past 
self was not thinking clearly.  In SearchTask we added vector retrieval for 
tasks  
https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTask.java#L312
but the way it is done will retrieve the vector for the wrong doc as it is 
using the docid to retrieve from the ordinal API.

So -- we should fix that last issue and see if we can measure the impact of 
this change, and then maybe we can find a way to control this per-task?


was (Author: sokolov):
Checking luceneutil, that does not surprise me. At some point I had added 
support for retrieval of vectors (as we had done previously for stored fields), 
which would in theory exercise this API. KNN search does not exercise it - it 
uses only the random access API, not the forward iterator (nextDoc) that was 
optimized here.

It would be nice to measure the impact, but I think we had disabled this 
previously since turning it on would impact *all* tasks, not just the vector 
tasks. At the same time I see that the way this was implemented *also* uses the 
random access API, although it should not. My  past self was not thinking 
clearly.  In SearchTask we added vector retrieval for tasks  
https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTask.java#L312
but the way it is done will retrieve the vector for the wrong doc as it is 
using the docid to retrieve from the ordinal API.

So -- we should fix that last issue and see if we can measure the impact of 
this change, and then maybe we can find a way to control this per-task?

> Faster advance on Vector Values
> -------------------------------
>
>                 Key: LUCENE-9674
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9674
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: master (9.0)
>         Environment:  
>            Reporter: Anand Kotriwal
>            Priority: Major
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> The advance() function in the class Lucene90VectorReader does a linear search 
> for the target document.
> To make it faster we can do a  binary search over the "ordToDoc" array which 
> will make the advance operation take logarithmic time to search.This will 
> make retrieving vectors for a sparse set of documents efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9674) Faster advance on Vector Values

Reply via email to