[ https://issues.apache.org/jira/browse/LUCENE-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567358#comment-17567358 ]
Michael Sokolov commented on LUCENE-10655: ------------------------------------------ OK I was confused, and in fact we already do use SparseFixedBitSet for every layer. And I tried allocating afresh rather than clear-ing, and it was a bit slower. FixedBitSet.clear() is a hot-spot but it's not really clear what's to be done about it. > can we optimize visited bitset usage in HNSW graph search/indexing? > ------------------------------------------------------------------- > > Key: LUCENE-10655 > URL: https://issues.apache.org/jira/browse/LUCENE-10655 > Project: Lucene - Core > Issue Type: Improvement > Components: core/hnsw > Reporter: Michael Sokolov > Priority: Major > > When running {{luceneutil}} I noticed that {{FixedBitSet.clear()}} dominates > the CPU profiler output. I had a few ideas: > # In upper graph layers, the occupied nodes are very sparse - maybe > {{SparseFixedBitSet}} would be a better fit for those > # We are caching these bitsets, but they are only used for a single search > (single document insert, during indexing). Should we cache across searches? > We would need to pool them though, and they would vary by field since fields > can have different numbers of vector nodes. This starts to get complex > # Are we sure that clearing a bitset is more efficient than allocating a new > one? Maybe the JDK maintains a pool of already-zeroed memory for us > I think we could try specializing the bitset type by graph level, and then I > think we ought to measure the performance of allocation vs the limited reuse > that we currently have. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org