graham-macdonald-simplisafe opened a new issue, #571:
URL: https://github.com/apache/pulsar-client-cpp/issues/571

   ## Motivation
   
   The Java client introduced `SameAuthParamsLookupAutoClusterFailover` in 
apache/pulsar#23129 (merged August 2024, released in Pulsar 4.0.0 and 
backported to 3.0.7 / 3.3.2). This `ServiceUrlProvider` implementation 
addresses a well-known reliability gap in `AutoClusterFailover` that is 
particularly relevant to geo-replication deployments sitting behind a Pulsar 
Proxy.
   
   **The problem with `AutoClusterFailover`**: its health probe is a raw TCP 
connection. In a typical deployment where a Pulsar Proxy fronts the brokers, 
the TCP probe succeeds as soon as the proxy accepts the connection — even if 
all brokers behind the proxy have crashed. This means `AutoClusterFailover` 
cannot detect broker-layer failure and may reconnect clients to a cluster that 
is not actually serving requests.
   
   **What `SameAuthParamsLookupAutoClusterFailover` does differently**:
   - Probes cluster health via a **topic lookup** (`getBroker()` on a 
configurable test topic) rather than a raw TCP connection. A broker that can 
respond to a lookup is demonstrably processing requests — the proxy cannot mask 
broker failure here.
   - Introduces a **hysteresis state machine** with separate 
`failoverThreshold` and `recoverThreshold` counters (default 5 each), requiring 
consecutive failures before cutting over and consecutive successes before 
switching back. This prevents flapping without requiring a coarse 
`switchBackDelay` timer.
   - Targets geo-replication topologies where all clusters share the same 
authentication credentials, which is the common case.
   
   ## Request
   
   Port `SameAuthParamsLookupAutoClusterFailover` to the C++ client.
   
   The `ServiceInfoProvider` interface is already part of the C++ public API 
(`include/pulsar/ServiceInfoProvider.h`), and `AutoClusterFailover` is already 
implemented against it — so the interface contract is defined and the pattern 
is established. The Java implementation 
([`SameAuthParamsLookupAutoClusterFailover.java`](https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/SameAuthParamsLookupAutoClusterFailover.java))
 serves as a direct reference.
   
   ## Impact
   
   The C++ client is the foundation for the Node.js client binding. Once 
`SameAuthParamsLookupAutoClusterFailover` is available in C++, it can be 
surfaced to Node.js consumers as well — a client language that currently has no 
automatic failover support at all.
   
   This would bring C++ and Node.js deployments to parity with Java on the most 
important `AutoClusterFailover` reliability fix for proxy-fronted 
geo-replication clusters.
   
   ## References
   
   - apache/pulsar#23129 — original Java implementation and motivation
   - `include/pulsar/ServiceInfoProvider.h` — existing C++ interface
   - `lib/AutoClusterFailover.cc` — existing C++ `AutoClusterFailover` 
implementation (reference for structure)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to