mikemccand commented on PR #14443: URL: https://github.com/apache/lucene/pull/14443#issuecomment-2844952688
> Sorry I'm still a bit confused: how is this approach better than just committing more frequently, replicating commits as soon as they are created, and refreshing searchers as soon as commits are replicated? In Amazon's usage, and I would expect other high-rate NRT segment replication systems, it's helpful to strongly decouple the production of the new commit points (triggered by time, or by X GB new segments), from the replication of these commits. During peace time (happy path), what you're suggesting works well -- commit and replication can always match each other 1 for 1. But under duress (unhappy path), commit point production can be faster (usually, hopefully, temporarily) than the replication can keep up, maybe because crazy high rate of document updates, or slow pipe for replication, or bit errors needing lots of retries, etc. (distributed systems seem to have all sorts of fun ways to become problematic!) ... and for those situations, it's really nice to have this possible decoupling easily accessible. It's also delightful because Lucene makes it quite simple to keep more than one commit point alive at once (it's just a custom `IndexDeletionPolicy`) ... building this decoupling on top of that is "relatively" easy heh. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org