lhotari commented on PR #26008:
URL: https://github.com/apache/pulsar/pull/26008#issuecomment-4692622752

   I performed a local review with Claude Code Fable 5 and it found these 
findings. Please check them.
   
   ### Summary
   
   Clean, well-structured follow-up to #25980. The layering design 
(`AutoScaleConfig.resolve` = broker defaults → namespace → topic, 
most-specific-wins per field), the `SegmentLayout.toMetadata(original)` fix so 
split/merge CAS rewrites don't drop the per-topic override, and the lazy 
per-evaluation resolution in the controller all match the description, and the 
test coverage is good (including the override-survives-split case). There is 
one real gap: the set-time validation doesn't cover the namespace + topic 
*combination*, and when that combination is invalid the controller's evaluation 
fails on every tick, silently disabling auto split/merge for the topic.
   
   ### Findings
   
   1. **[BUG]** Individually-valid namespace and topic overrides can combine 
into an invalid policy, which then permanently kills auto split/merge for the 
topic at runtime — `ScalableTopics.java` (`internalSetAutoScalePolicy`), 
`NamespacesBase.java` (`internalSetScalableTopicAutoScalePolicyAsync`), 
`ScalableTopicController.resolveAutoScaleConfig`.
   
      The namespace endpoint validates `resolve(conf, override, null)` and the 
topic endpoint validates `resolve(conf, null, override)` — each layer is only 
ever checked against the broker defaults, never against the other layer. 
Example: broker defaults `splitMsgRateIn=1000`, `mergeMsgRateIn=100`. A 
namespace override of `mergeMsgRateInThreshold=500` passes (1000 > 500). A 
topic override of `splitMsgRateInThreshold=200` passes (200 > 100). Combined, 
`split=200 ≤ merge=500` violates the hysteresis invariant. The same composition 
problem exists for `minSegments`/`maxSegments` across layers, and for stored 
overrides that become invalid after a broker restart with changed 
`scalableTopic*` defaults (validation only ever ran against the config of the 
broker that handled the admin call).
   
      At runtime, `resolveAutoScaleConfig` then throws 
`IllegalArgumentException` inside `thenCombine` on **every** evaluation; 
`runAutoScaleSafely` catches it and logs a WARN, so auto split/merge is 
silently dead for that topic with no admin-visible signal. (The in-flight flag 
is correctly released via `whenComplete`, so there's no leak — it just never 
works again until someone fixes the override.)
   
      Suggested fix, two parts: (a) at topic-level set time, validate against 
the *current* namespace override (`resolve(conf, currentNsOverride, override)`) 
— it's a cheap cached read and closes the common case; (b) since (a) is still 
racy (the namespace override can change afterwards, and broker defaults can 
change across restarts), make the controller resilient: catch 
`IllegalArgumentException` in `resolveAutoScaleConfig` and fall back to a 
defined behavior (e.g. treat as disabled, or drop the most-specific layer) with 
a clear log, rather than failing the whole evaluation chain.
   
   2. **[INTENT MISMATCH]** The PR description and the code comment in 
`ScalableTopics.internalSetAutoScalePolicy` overstate the validation guarantee. 
The description says an override "only invalid in combination … is rejected 
with a 412 at set time", and the comment reasons "(the namespace layer can only 
have been valid the same way)" — that reasoning doesn't hold, per finding 1: 
both layers being valid *against broker defaults* does not make their 
combination valid. Worth rewording even if the behavior gap is fixed.
   
   3. **[QUALITY]** No `pulsar-admin` CLI bindings. `CmdScalableTopics` and 
`CmdNamespaces` exist in `pulsar-client-tools`, and sibling policies like 
`autoTopicCreation` have CLI commands; the new set/get/remove endpoints are 
reachable only via the Java admin client or raw REST. Fine as a stated 
follow-up, but worth noting in the PR if intentional.
   
   4. **[QUALITY]** Minor REST doc inaccuracy: the topic-level GET advertises 
"200 … empty body if no override is set", but `asyncResponse.resume(null)` 
produces a **204** in that case (the namespace GET has the same behavior). The 
Java client handles it (returns `null`), but the OpenAPI description doesn't 
match the wire behavior — `ScalableTopics.java` `getAutoScalePolicy`.
   
   5. **[QUALITY]** Test hygiene in `ScalableTopicAutoScalePolicyTest`: 
`testNamespaceLevelRoundTrip` and `testInvalidOverrideRejected` mutate the 
shared namespace of `SharedPulsarBaseTest`. If an assertion fails between set 
and remove, the `enabled=false` namespace override leaks into other tests 
sharing that broker. A `finally`-style cleanup (or a dedicated namespace) would 
make it robust.
   
   Smaller things checked and found fine: the 404 path for missing topics works 
(`readModifyUpdate` propagates `NotFoundException`, mapped to 404 in both 
endpoints); moving the `enabled` check inside the `autoScaleInFlight` window is 
harmless; the `ScalableTopicMetadata` all-args constructor signature change is 
acceptable since PIP-483 is unreleased; the new `PolicyName` enum constant is 
appended safely.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to