graham-macdonald-simplisafe opened a new issue, #571: URL: https://github.com/apache/pulsar-client-cpp/issues/571
## Motivation The Java client introduced `SameAuthParamsLookupAutoClusterFailover` in apache/pulsar#23129 (merged August 2024, released in Pulsar 4.0.0 and backported to 3.0.7 / 3.3.2). This `ServiceUrlProvider` implementation addresses a well-known reliability gap in `AutoClusterFailover` that is particularly relevant to geo-replication deployments sitting behind a Pulsar Proxy. **The problem with `AutoClusterFailover`**: its health probe is a raw TCP connection. In a typical deployment where a Pulsar Proxy fronts the brokers, the TCP probe succeeds as soon as the proxy accepts the connection — even if all brokers behind the proxy have crashed. This means `AutoClusterFailover` cannot detect broker-layer failure and may reconnect clients to a cluster that is not actually serving requests. **What `SameAuthParamsLookupAutoClusterFailover` does differently**: - Probes cluster health via a **topic lookup** (`getBroker()` on a configurable test topic) rather than a raw TCP connection. A broker that can respond to a lookup is demonstrably processing requests — the proxy cannot mask broker failure here. - Introduces a **hysteresis state machine** with separate `failoverThreshold` and `recoverThreshold` counters (default 5 each), requiring consecutive failures before cutting over and consecutive successes before switching back. This prevents flapping without requiring a coarse `switchBackDelay` timer. - Targets geo-replication topologies where all clusters share the same authentication credentials, which is the common case. ## Request Port `SameAuthParamsLookupAutoClusterFailover` to the C++ client. The `ServiceInfoProvider` interface is already part of the C++ public API (`include/pulsar/ServiceInfoProvider.h`), and `AutoClusterFailover` is already implemented against it — so the interface contract is defined and the pattern is established. The Java implementation ([`SameAuthParamsLookupAutoClusterFailover.java`](https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/SameAuthParamsLookupAutoClusterFailover.java)) serves as a direct reference. ## Impact The C++ client is the foundation for the Node.js client binding. Once `SameAuthParamsLookupAutoClusterFailover` is available in C++, it can be surfaced to Node.js consumers as well — a client language that currently has no automatic failover support at all. This would bring C++ and Node.js deployments to parity with Java on the most important `AutoClusterFailover` reliability fix for proxy-fronted geo-replication clusters. ## References - apache/pulsar#23129 — original Java implementation and motivation - `include/pulsar/ServiceInfoProvider.h` — existing C++ interface - `lib/AutoClusterFailover.cc` — existing C++ `AutoClusterFailover` implementation (reference for structure) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
