[
https://issues.apache.org/jira/browse/LUCENE-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567358#comment-17567358
]
Michael Sokolov commented on LUCENE-10655:
------------------------------------------
OK I was confused, and in fact we already do use SparseFixedBitSet for every
layer. And I tried allocating afresh rather than clear-ing, and it was a bit
slower. FixedBitSet.clear() is a hot-spot but it's not really clear what's to
be done about it.
> can we optimize visited bitset usage in HNSW graph search/indexing?
> -------------------------------------------------------------------
>
> Key: LUCENE-10655
> URL: https://issues.apache.org/jira/browse/LUCENE-10655
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/hnsw
> Reporter: Michael Sokolov
> Priority: Major
>
> When running {{luceneutil}} I noticed that {{FixedBitSet.clear()}} dominates
> the CPU profiler output. I had a few ideas:
> # In upper graph layers, the occupied nodes are very sparse - maybe
> {{SparseFixedBitSet}} would be a better fit for those
> # We are caching these bitsets, but they are only used for a single search
> (single document insert, during indexing). Should we cache across searches?
> We would need to pool them though, and they would vary by field since fields
> can have different numbers of vector nodes. This starts to get complex
> # Are we sure that clearing a bitset is more efficient than allocating a new
> one? Maybe the JDK maintains a pool of already-zeroed memory for us
> I think we could try specializing the bitset type by graph level, and then I
> think we ought to measure the performance of allocation vs the limited reuse
> that we currently have.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]