[jira] [Created] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?

Michael Sokolov (Jira) Fri, 15 Jul 2022 08:30:18 -0700

Michael Sokolov created LUCENE-10655:
----------------------------------------


             Summary: can we optimize visited bitset usage in HNSW graph 
search/indexing?
                 Key: LUCENE-10655
                 URL: https://issues.apache.org/jira/browse/LUCENE-10655
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/hnsw
            Reporter: Michael Sokolov


When running {{luceneutil}}  I noticed that {{FixedBitSet.clear()}} dominates 
the CPU profiler output. I had a few ideas:
 # In upper graph layers, the occupied nodes are very sparse - maybe 
{{SparseFixedBitSet}} would be a better fit for those
 # We are caching these bitsets, but they are only used for a single search 
(single document insert, during indexing). Should we cache across searches? We 
would need to pool them though, and they would vary by field since fields can 
have different numbers of vector nodes. This starts to get complex
 # Are we sure that clearing a bitset is more efficient than allocating a new 
one? Maybe the JDK maintains a pool of already-zeroed memory for us

I think we could try specializing the bitset type by graph level, and then I 
think we ought to measure the performance of allocation vs the limited reuse 
that we currently have.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-10655) can we optimize visited bitset usage in HNSW graph search/indexing?

Reply via email to