msokolov commented on PR #12254: URL: https://github.com/apache/lucene/pull/12254#issuecomment-1528050905
Very exciting! After a very quick glance through the code, a few high-level comments/questions: To use this when building indexes in Lucene, I think we'd need a way to control the threads used, so e.g. I don't think the parallel streams would allow for that. Have you considered accepting an Executor and using that to run the updates? TBH I am not an expert on Lucene's IndexWriter threading model, but here is my stab at some background info: We need to maintain free threads that enable indexing to go on concurrent with flush / merge that would invoke this. ConcurrentMergeScheduler, for one, enables callers to specify the number of threads used for merging. Typically these are allocated per-merged-segment, but in theory it can be free to allow a single merge to use multiple threads. As for flush (initial creation of a new segment), I believe this is always single-threaded today. Perhaps we could use concurrency there too somehow, but it's a secondary target I think. I don't see the need for these classes to conform to the existing HnswGraph API. If we decide we want to use this implementation when writing the index, we can switch LucenexxHnswWriter to do so. Given that, could we drop the use of ThreadLocal? I wonder what the single-threaded performance of these new classes is as compared to the existing impl. Have you tested that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org