benwtrent commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2622749459
@kaivalnp the force-merge time indicates that during merge to a single segment, the index is being rebuilt from various segments. I would think that the `force-merge` time itself is more indicative of the cost of indexing than just the initial index phase. For this benchmark there are a couple of options: - Increase the KnnIndexer buffer size to allow 2x of the float vectors in memory (thus keeping a flush from occurring until the graph is ready to be built) and remove the "force-merge" option completely. This will also a single segment to be created. Just make sure you have enough heap allocated. - Simply sum the two numbers together. - Don't forcemerge at all and just accept multiple segments. One other concern. There are two types of "multi-threading" in the indexing. There are the number of threads doing the indexing (e.g. writing to an indexer and creating a segment) and the number of threads used when building a graph. For simplicity, I would reduce the number of indexing threads to 1, faiss threads to 1, and merge workers to 1. Once we have numbers of the cost of running on a single thread, then we can see how adjusting these allows one to pull ahead of the other. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org