Michael Sokolov created LUCENE-10655: ----------------------------------------
Summary: can we optimize visited bitset usage in HNSW graph search/indexing? Key: LUCENE-10655 URL: https://issues.apache.org/jira/browse/LUCENE-10655 Project: Lucene - Core Issue Type: Improvement Components: core/hnsw Reporter: Michael Sokolov When running {{luceneutil}} I noticed that {{FixedBitSet.clear()}} dominates the CPU profiler output. I had a few ideas: # In upper graph layers, the occupied nodes are very sparse - maybe {{SparseFixedBitSet}} would be a better fit for those # We are caching these bitsets, but they are only used for a single search (single document insert, during indexing). Should we cache across searches? We would need to pool them though, and they would vary by field since fields can have different numbers of vector nodes. This starts to get complex # Are we sure that clearing a bitset is more efficient than allocating a new one? Maybe the JDK maintains a pool of already-zeroed memory for us I think we could try specializing the bitset type by graph level, and then I think we ought to measure the performance of allocation vs the limited reuse that we currently have. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org