merlimat opened a new pull request, #25929: URL: https://github.com/apache/pulsar/pull/25929
## Motivation The scalable-topics transaction coordinator (PIP-473) so far reused the `transaction_coordinator_assign` partitioned topic for **two** jobs: per-partition leader election *and* client discovery. Both route through the topic / namespace / bundle / load-balancer machinery, so TC coordination liveness is entangled with that machinery — even though the TC keeps **no state** in the assign-topic (its ledgers are empty; it's purely an ownership token). This PR makes the scalable-topics TC's leader election depend on the **metadata store only**, and replaces the topic-lookup-based discovery with a dedicated assignment watch. The coordinator now rests directly on the metadata store it already hard-depends on for every header read/write, instead of on a layer above it. > Note: sharding is unchanged — placement is still client-side round-robin and the TC partition is encoded in `TxnID.mostSigBits`. Only *how a broker becomes leader for partition N* and *how the client finds that broker* change. ## Modifications **Broker election** - Per-partition `LeaderElection<TcLeader>` over `/txn/tc/leader/<N>` via the existing `CoordinationService`. - New config `transactionCoordinatorScalableTopicsParallelism` (default 16) — the degree of coordinator parallelism; replaces the assign-topic partition count for the v5 path. - `TransactionCoordinatorV5.isLeaderFor(int)` now gates both client-connect and the timeout/GC sweeps (was the assign-topic ownership check). **Discovery (new wire commands)** - `CommandWatchTcAssignments` / `CommandWatchTcAssignmentsUpdate` / `CommandWatchTcAssignmentsClose` + `TcAssignment` / `TcAssignmentsSnapshot`. - The client opens **one** watch; the broker replies with the full `partition → leader` map and re-pushes the **full snapshot** on every leadership change. No point lookup, no diff/hash — the map is bounded (parallelism, ~16) and changes rarely, so always sending the whole snapshot is simpler and removes a class of apply-ordering/drift bugs. - New `FeatureFlags.supports_tc_metadata_discovery`. **Client** - `TcDiscovery` strategy: `WatchTcAssignmentsDiscovery` (new path) when the broker advertises the feature flag, else `AssignTopicTcDiscovery` (the existing flow — still the v4 path). - `TransactionMetaStoreHandler` can connect directly to a coordinator's elected leader broker and retarget when a snapshot moves leadership. ## Backward compatibility Purely additive on the wire. The assign-topic remains the v4 / fallback discovery surface during the deprecation window: - *new client + new broker* → assignment watch; - *old client + new broker* → assign-topic lookup (broker still owns the bundle; `TC_CLIENT_CONNECT` still works); - *new client + old broker* → feature flag absent → falls back to assign-topic. Defaults are unchanged — the scalable-topics TC stays off; the default flip lands in P5.4. ## Tests - [x] `TransactionCoordinatorV5Test` — election + leadership coverage (21 cases). - [x] `CommandsTcAssignmentsTest` — wire round-trip (5 cases). - [x] `TcMetadataDiscoveryTest` — multi-broker docker integration: transaction lifecycle over the assignment watch across all coordinator partitions, and survival of a leader-broker failure (re-election → watch refresh → handler retarget). Wired into the `TRANSACTION` CI group. - [x] Checkstyle clean across broker / client / common / integration. **Scope note:** the integration tests exercise the transaction **lifecycle** (newTransaction / commit / abort) over the discovered connections — the full surface the new client discovery path drives. A data-in-transaction e2e (produce/ack inside a txn) additionally needs the scalable-topic buffer + pending-ack providers and `segment://` topics, which land with P5.4 / P6. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
