merlimat opened a new pull request, #25929:
URL: https://github.com/apache/pulsar/pull/25929

   ## Motivation
   
   The scalable-topics transaction coordinator (PIP-473) so far reused the
   `transaction_coordinator_assign` partitioned topic for **two** jobs: 
per-partition leader election *and* client discovery. Both route through the 
topic / namespace / bundle / load-balancer machinery, so TC coordination 
liveness is entangled with that machinery — even though the TC keeps **no 
state** in the assign-topic (its ledgers are empty; it's purely an ownership 
token).
   
   This PR makes the scalable-topics TC's leader election depend on the 
**metadata store only**, and replaces the topic-lookup-based discovery with a 
dedicated assignment watch. The coordinator now rests directly on the metadata 
store it already hard-depends on for every header read/write, instead of on a 
layer above it.
   
   > Note: sharding is unchanged — placement is still client-side round-robin 
and the TC partition is encoded in `TxnID.mostSigBits`. Only *how a broker 
becomes leader for partition N* and *how the client finds that broker* change.
   
   ## Modifications
   
   **Broker election**
   - Per-partition `LeaderElection<TcLeader>` over `/txn/tc/leader/<N>` via the 
existing `CoordinationService`.
   - New config `transactionCoordinatorScalableTopicsParallelism` (default 16) 
— the degree of coordinator parallelism; replaces the assign-topic partition 
count for the v5 path.
   - `TransactionCoordinatorV5.isLeaderFor(int)` now gates both client-connect 
and the timeout/GC sweeps (was the assign-topic ownership check).
   
   **Discovery (new wire commands)**
   - `CommandWatchTcAssignments` / `CommandWatchTcAssignmentsUpdate` / 
`CommandWatchTcAssignmentsClose` + `TcAssignment` / `TcAssignmentsSnapshot`.
   - The client opens **one** watch; the broker replies with the full 
`partition → leader` map and re-pushes the **full snapshot** on every 
leadership change. No point lookup, no diff/hash — the map is bounded 
(parallelism, ~16) and changes rarely, so always sending the whole snapshot is 
simpler and removes a class of apply-ordering/drift bugs.
   - New `FeatureFlags.supports_tc_metadata_discovery`.
   
   **Client**
   - `TcDiscovery` strategy: `WatchTcAssignmentsDiscovery` (new path) when the 
broker advertises the feature flag, else `AssignTopicTcDiscovery` (the existing 
flow — still the v4 path).
   - `TransactionMetaStoreHandler` can connect directly to a coordinator's 
elected leader broker and retarget when a snapshot moves leadership.
   
   ## Backward compatibility
   
   Purely additive on the wire. The assign-topic remains the v4 / fallback 
discovery surface during the deprecation window:
   - *new client + new broker* → assignment watch;
   - *old client + new broker* → assign-topic lookup (broker still owns the 
bundle; `TC_CLIENT_CONNECT` still works);
   - *new client + old broker* → feature flag absent → falls back to 
assign-topic.
   
   Defaults are unchanged — the scalable-topics TC stays off; the default flip 
lands in P5.4.
   
   ## Tests
   
   - [x] `TransactionCoordinatorV5Test` — election + leadership coverage (21 
cases).
   - [x] `CommandsTcAssignmentsTest` — wire round-trip (5 cases).
   - [x] `TcMetadataDiscoveryTest` — multi-broker docker integration: 
transaction lifecycle over the assignment watch across all coordinator 
partitions, and survival of a leader-broker failure (re-election → watch 
refresh → handler retarget). Wired into the `TRANSACTION` CI group.
   - [x] Checkstyle clean across broker / client / common / integration.
   
   **Scope note:** the integration tests exercise the transaction **lifecycle** 
(newTransaction / commit / abort) over the discovered connections — the full 
surface the new client discovery path drives. A data-in-transaction e2e 
(produce/ack inside a txn) additionally needs the scalable-topic buffer + 
pending-ack providers and `segment://` topics, which land with P5.4 / P6.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to