lhotari commented on PR #674:
URL: 
https://github.com/apache/pulsar-helm-chart/pull/674#issuecomment-4298283353

   > _Review assisted by Claude (Opus 4.7). Flagging structural concerns and a 
few smaller issues — feel free to push back on any of them._
   
   Thanks for putting this together — the component-suppression approach is 
clean and the CI coverage is a nice touch. I have one structural concern and a 
few smaller issues worth addressing before merge.
   
   ## 1. Use a `Deployment` instead of a `StatefulSet` for standalone
   
   Pulsar standalone embeds ZooKeeper + BookKeeper + Broker in one JVM. There's 
no sharding, no quorum, no peer gossip — scaling it beyond 1 produces N 
isolated Pulsar "clusters" with the same identity contending for the same 
ClusterIP and (on RWO PVC) the same volume.
   
   StatefulSet gives you three things, none of which help here:
   
   | Feature | Value for standalone |
   |---|---|
   | Stable per-pod DNS (`pod-0.headless…`) | Irrelevant — one pod. Service DNS 
is enough. |
   | `OrderedReady` rolling updates | Meaningless at `replicas: 1`. |
   | `volumeClaimTemplates` | Trivially replaced by a plain PVC. |
   
   Costs it imposes:
   
   - **Footgun.** `standalone.replicaCount` is exposed with no enforcement. A 
user who sets `2` gets a silently broken deployment.
   - **Immutable fields.** `serviceName`, `selector`, `volumeClaimTemplates`, 
`podManagementPolicy` are all immutable — upgrades that touch them require 
deleting the workload.
   - **Extra surface area.** Headless service, `${HOSTNAME}.headless…` 
advertised-address plumbing, `OrderedReady` all exist only because it's a 
StatefulSet.
   
   **Suggested replacement:**
   
   ```yaml
   apiVersion: apps/v1
   kind: Deployment
   spec:
     replicas: 1                 # hardcoded; drop standalone.replicaCount from 
values.yaml
     strategy:
       type: Recreate            # releases RWO PVC before the new pod starts
     template:
       spec:
         hostname: standalone    # together with subdomain, makes hostname -f 
resolvable
         subdomain: "{{ template \"pulsar.standalone.service.headless\" . }}"
         containers:
           - args:
               - |
                 ...
                 # --advertised-address is REQUIRED: Pulsar no longer defaults 
this to
                 # FQDN after apache/pulsar#25523 reverted apache/pulsar#25238. 
Without
                 # it, client lookups return a non-resolvable address.
                 exec bin/pulsar standalone --advertised-address "$(hostname 
-f)" ...
   ---
   apiVersion: v1
   kind: PersistentVolumeClaim
   metadata:
     annotations:
       helm.sh/resource-policy: keep   # preserve data on helm uninstall
   spec:
     accessModes: [ReadWriteOnce]
     resources: { requests: { storage: "{{ .Values.standalone.volumes.data.size 
}}" } }
   ```
   
   Benefits: no `replicaCount` footgun, no immutable-field upgrade pain, PVC 
retention is explicit, and the model matches what standalone actually is — a 
single-pod app.
   
   If keeping StatefulSet is preferred for chart consistency, at minimum: 
hardcode `replicas: 1` (or `fail` on `replicaCount > 1`) and drop 
`podManagementPolicy: OrderedReady`.
   
   ## 2. `--advertised-address` is now mandatory — please comment *why*
   
   apache/pulsar#25238 made standalone default its advertised address to the 
FQDN, but apache/pulsar#25523 reverted that with no replacement. Pulsar 
standalone will no longer derive an advertisable address on its own — omitting 
`--advertised-address` causes client lookups to return a non-resolvable 
hostname/IP and connections fail.
   
   The PR correctly passes it, but the current helper:
   
   ```
   {{- define "pulsar.standalone.hostname" -}}
   ${HOSTNAME}.{{ template "pulsar.standalone.service.headless" . }}.{{ 
template "pulsar.namespace" . }}.svc.{{ .Values.clusterDomain }}
   {{- end -}}
   ```
   
   …is exactly what `hostname -f` returns inside a pod with a headless 
`serviceName`/`subdomain`. Simplify and, critically, leave a comment so a 
future reader doesn't "clean up" the flag:
   
   ```yaml
   # --advertised-address is REQUIRED: Pulsar no longer defaults this to FQDN
   # after apache/pulsar#25523 reverted apache/pulsar#25238. Without it,
   # client lookups return a non-resolvable address and connections fail.
   --advertised-address "$(hostname -f)"
   ```
   
   ## 3. Don't force-disable the proxy when standalone is enabled
   
   Every `proxy-*.yaml` gained `(not .Values.standalone.enabled)`, which 
hard-disables Pulsar Proxy whenever standalone is on. That's overly restrictive 
— Proxy in front of standalone is a valid and useful topology (TLS termination 
at the edge, single stable ingress endpoint, not exposing broker listeners 
directly).
   
   `toolset-configmap.yaml` already demonstrates the right pattern:
   
   ```yaml
   {{- if .Values.standalone.enabled }}
     webServiceUrl: "http://{{ template "pulsar.standalone.service" . }}:..."
     brokerServiceUrl: "pulsar://{{ template "pulsar.standalone.service" . 
}}:..."
   {{- else if ... }}
   ```
   
   Please:
   
   1. Drop `(not .Values.standalone.enabled)` from the proxy templates — keep 
the gate on `.Values.components.proxy` only.
   2. Mirror the toolset pattern in `proxy-configmap.yaml`: when 
`standalone.enabled`, point `brokerServiceURL` / `brokerWebServiceURL` (+ TLS 
variants) at the standalone service.
   3. Re-check proxy RBAC / service-account guards so they come along.
   4. Show the combination in `examples/values-standalone.yaml` with a 
commented `components.proxy: true` block.
   
   Also avoids the silent regression where flipping `standalone.enabled: true` 
makes an existing `components.proxy: true` simply disappear from the rendered 
manifest.
   
   ## Smaller issues
   
   - **`tls.standalone.cert_name` defaults to `tls-proxy`** (`values.yaml`) — 
collides with the proxy cert name and becomes a real clash if #3 above is 
adopted. Change to `tls-standalone`.
   - **Plain HTTP leaks with TLS enabled.** `standalone-service.yaml` always 
publishes port `8080`, and `standalone-configmap.yaml` only sets 
`PULSAR_PREFIX_webServicePortTls` / `PULSAR_PREFIX_brokerServicePortTls` — it 
doesn't null the non-TLS `webServicePort` / `brokerServicePort`. Both listeners 
end up active. Either set `PULSAR_PREFIX_webServicePort: ""` / 
`PULSAR_PREFIX_brokerServicePort: ""` when TLS is on, or gate the Service ports 
on `tls.enabled` (the way the broker templates do).
   - **Functions + standalone has no RBAC.** `broker-rbac.yaml` is gated on 
`not .Values.standalone.enabled`, but the standalone start script enables the 
functions worker when `components.functions: true`. The K8s runtime will hit 
RBAC failures. Either emit the functions RBAC bound to the standalone SA in 
that combination, or `fail` on it.
   - **CI `--timeout` bump applies globally.** `.ci/helm.sh` raises the install 
timeout from 6m → 10m for *every* job, not just standalone. If standalone 
specifically needs more time, scope the bump; otherwise document why the 
blanket increase is needed.
   - **`set -euo pipefail` in `kind-cluster-build.sh`** is a good drive-by fix 
but unrelated — worth calling out in the PR body (already done — thanks).
   
   Happy to look at a follow-up commit once you've had a chance to triage these.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to