[jira] [Updated] (LUCENE-10592) Should we build HNSW graph on the fly during indexing

Mayya Sharipova (Jira) Wed, 25 May 2022 13:43:03 -0700


     [ 
https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mayya Sharipova updated LUCENE-10592:
-------------------------------------
    Description: 
Currently, when we index vectors for KnnVectorField, we buffer those vectors in 
memory and on flush during a segment construction we build an HNSW graph.  As 
building an HNSW graph is very expensive, this makes flush operation take a lot 
of time. This also makes overall indexing performance quite unpredictable (as 
the number of flushes are defined by memory used, and the presence of 
concurrent searches), e.g. some indexing operations return almost instantly 
while others that trigger flush take a lot of time. 

Building an HNSW graph on the fly as we index vectors allows to avoid this 
problem, and spread a load of HNSW graph construction evenly during indexing.

This will also supersede LUCENE-10194

  was:
Currently, when we index vectors for KnnVectorField, we buffer those vectors in 
memory and on flush during a segment construction we build an HNSW graph.  As 
building an HNSW graph is very expensive, this makes flush operation take a lot 
of time. This also makes overall indexing performance quite unpredictable (as 
the number of flushes are defined by memory used, and the presence of 
concurrent searches), e.g. some indexing operations return almost instantly 
while others that trigger flush take a lot of time. 

Building an HNSW graph on the fly as we index vectors allows to avoid this 
problem, and spread a load of HNSW graph construction evenly during indexing.


> Should we build HNSW graph on the fly during indexing
> -----------------------------------------------------
>
>                 Key: LUCENE-10592
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10592
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Mayya Sharipova
>            Priority: Minor
>
> Currently, when we index vectors for KnnVectorField, we buffer those vectors 
> in memory and on flush during a segment construction we build an HNSW graph.  
> As building an HNSW graph is very expensive, this makes flush operation take 
> a lot of time. This also makes overall indexing performance quite 
> unpredictable (as the number of flushes are defined by memory used, and the 
> presence of concurrent searches), e.g. some indexing operations return almost 
> instantly while others that trigger flush take a lot of time. 
> Building an HNSW graph on the fly as we index vectors allows to avoid this 
> problem, and spread a load of HNSW graph construction evenly during indexing.
> This will also supersede LUCENE-10194



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-10592) Should we build HNSW graph on the fly during indexing

Reply via email to