GitHub user Denovo1998 added a comment to the discussion: Async Geo-Replication and Cluster Down Scenarios
`replicationBacklog` is not a separately configurable buffer size in Pulsar. It is effectively the backlog of the replicator cursor, i.e. the number of source-topic entries that have not yet been acknowledged by the remote-cluster replication path. The `QueueSize` setting you found only controls the internal replication producer pending queue and how aggressively the replicator reads from the ledger. It does not cap the durable backlog stored on disk. So in your A-down / B-still-serving scenario, yes: the B->A replication backlog can keep growing even if the local consumer backlog is close to zero. That happens because local consumer acknowledgements only advance subscription cursors. The replicator cursor advances only after the message is successfully published to the remote cluster. Until that happens, the entries remain retained on cluster B. This is by design, because otherwise Pulsar would be silently discarding data that has not yet been replicated. --- For TTL, the answer is also yes, with one important nuance: message expiry applies to replicator cursors too, so expired messages can be removed from the replication backlog and therefore will not be replicated once cluster A comes back. However, this is not guaranteed to happen exactly at T+5 seconds. TTL means the message becomes eligible for expiry after 5 seconds, and the actual removal depends on the broker’s periodic expiry check. So there is no dedicated `maxReplicationBacklog` knob. If you want to bound this risk during a remote-cluster outage, the practical controls are backlog quota, TTL, or an explicit operational decision to clear the replicator backlog and accept data loss for the remote cluster. GitHub link: https://github.com/apache/pulsar/discussions/25519#discussioncomment-16655252 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
