[ https://issues.apache.org/jira/browse/LUCENE-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17570976#comment-17570976 ]
Julie Tibshirani edited comment on LUCENE-10592 at 7/25/22 4:10 PM: -------------------------------------------------------------------- It looks like this commit gave a nice boost to indexing. From your benchmark results, we expected a small improvement, but this looks even larger: !Screen Shot 2022-07-25 at 9.04.11 AM.png|width=582,height=238! was (Author: julietibs): It looks like this commit gave a nice boost to indexing. From your benchmark results, we expected a small improvement, but this looks even larger: !Screen Shot 2022-07-25 at 9.04.11 AM.png|width=540,height=221! > Should we build HNSW graph on the fly during indexing > ----------------------------------------------------- > > Key: LUCENE-10592 > URL: https://issues.apache.org/jira/browse/LUCENE-10592 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Mayya Sharipova > Assignee: Mayya Sharipova > Priority: Minor > Fix For: 9.4 > > Attachments: Screen Shot 2022-07-25 at 9.04.11 AM.png > > Time Spent: 8h > Remaining Estimate: 0h > > Currently, when we index vectors for KnnVectorField, we buffer those vectors > in memory and on flush during a segment construction we build an HNSW graph. > As building an HNSW graph is very expensive, this makes flush operation take > a lot of time. This also makes overall indexing performance quite > unpredictable (as the number of flushes are defined by memory used, and the > presence of concurrent searches), e.g. some indexing operations return almost > instantly while others that trigger flush take a lot of time. > Building an HNSW graph on the fly as we index vectors allows to avoid this > problem, and spread a load of HNSW graph construction evenly during indexing. > This will also supersede LUCENE-10194 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org