lhotari commented on PR #674:
URL:
https://github.com/apache/pulsar-helm-chart/pull/674#issuecomment-4298283353
> _Review assisted by Claude (Opus 4.7). Flagging structural concerns and a
few smaller issues — feel free to push back on any of them._
Thanks for putting this together — the component-suppression approach is
clean and the CI coverage is a nice touch. I have one structural concern and a
few smaller issues worth addressing before merge.
## 1. Use a `Deployment` instead of a `StatefulSet` for standalone
Pulsar standalone embeds ZooKeeper + BookKeeper + Broker in one JVM. There's
no sharding, no quorum, no peer gossip — scaling it beyond 1 produces N
isolated Pulsar "clusters" with the same identity contending for the same
ClusterIP and (on RWO PVC) the same volume.
StatefulSet gives you three things, none of which help here:
| Feature | Value for standalone |
|---|---|
| Stable per-pod DNS (`pod-0.headless…`) | Irrelevant — one pod. Service DNS
is enough. |
| `OrderedReady` rolling updates | Meaningless at `replicas: 1`. |
| `volumeClaimTemplates` | Trivially replaced by a plain PVC. |
Costs it imposes:
- **Footgun.** `standalone.replicaCount` is exposed with no enforcement. A
user who sets `2` gets a silently broken deployment.
- **Immutable fields.** `serviceName`, `selector`, `volumeClaimTemplates`,
`podManagementPolicy` are all immutable — upgrades that touch them require
deleting the workload.
- **Extra surface area.** Headless service, `${HOSTNAME}.headless…`
advertised-address plumbing, `OrderedReady` all exist only because it's a
StatefulSet.
**Suggested replacement:**
```yaml
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 1 # hardcoded; drop standalone.replicaCount from
values.yaml
strategy:
type: Recreate # releases RWO PVC before the new pod starts
template:
spec:
hostname: standalone # together with subdomain, makes hostname -f
resolvable
subdomain: "{{ template \"pulsar.standalone.service.headless\" . }}"
containers:
- args:
- |
...
# --advertised-address is REQUIRED: Pulsar no longer defaults
this to
# FQDN after apache/pulsar#25523 reverted apache/pulsar#25238.
Without
# it, client lookups return a non-resolvable address.
exec bin/pulsar standalone --advertised-address "$(hostname
-f)" ...
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
helm.sh/resource-policy: keep # preserve data on helm uninstall
spec:
accessModes: [ReadWriteOnce]
resources: { requests: { storage: "{{ .Values.standalone.volumes.data.size
}}" } }
```
Benefits: no `replicaCount` footgun, no immutable-field upgrade pain, PVC
retention is explicit, and the model matches what standalone actually is — a
single-pod app.
If keeping StatefulSet is preferred for chart consistency, at minimum:
hardcode `replicas: 1` (or `fail` on `replicaCount > 1`) and drop
`podManagementPolicy: OrderedReady`.
## 2. `--advertised-address` is now mandatory — please comment *why*
apache/pulsar#25238 made standalone default its advertised address to the
FQDN, but apache/pulsar#25523 reverted that with no replacement. Pulsar
standalone will no longer derive an advertisable address on its own — omitting
`--advertised-address` causes client lookups to return a non-resolvable
hostname/IP and connections fail.
The PR correctly passes it, but the current helper:
```
{{- define "pulsar.standalone.hostname" -}}
${HOSTNAME}.{{ template "pulsar.standalone.service.headless" . }}.{{
template "pulsar.namespace" . }}.svc.{{ .Values.clusterDomain }}
{{- end -}}
```
…is exactly what `hostname -f` returns inside a pod with a headless
`serviceName`/`subdomain`. Simplify and, critically, leave a comment so a
future reader doesn't "clean up" the flag:
```yaml
# --advertised-address is REQUIRED: Pulsar no longer defaults this to FQDN
# after apache/pulsar#25523 reverted apache/pulsar#25238. Without it,
# client lookups return a non-resolvable address and connections fail.
--advertised-address "$(hostname -f)"
```
## 3. Don't force-disable the proxy when standalone is enabled
Every `proxy-*.yaml` gained `(not .Values.standalone.enabled)`, which
hard-disables Pulsar Proxy whenever standalone is on. That's overly restrictive
— Proxy in front of standalone is a valid and useful topology (TLS termination
at the edge, single stable ingress endpoint, not exposing broker listeners
directly).
`toolset-configmap.yaml` already demonstrates the right pattern:
```yaml
{{- if .Values.standalone.enabled }}
webServiceUrl: "http://{{ template "pulsar.standalone.service" . }}:..."
brokerServiceUrl: "pulsar://{{ template "pulsar.standalone.service" .
}}:..."
{{- else if ... }}
```
Please:
1. Drop `(not .Values.standalone.enabled)` from the proxy templates — keep
the gate on `.Values.components.proxy` only.
2. Mirror the toolset pattern in `proxy-configmap.yaml`: when
`standalone.enabled`, point `brokerServiceURL` / `brokerWebServiceURL` (+ TLS
variants) at the standalone service.
3. Re-check proxy RBAC / service-account guards so they come along.
4. Show the combination in `examples/values-standalone.yaml` with a
commented `components.proxy: true` block.
Also avoids the silent regression where flipping `standalone.enabled: true`
makes an existing `components.proxy: true` simply disappear from the rendered
manifest.
## Smaller issues
- **`tls.standalone.cert_name` defaults to `tls-proxy`** (`values.yaml`) —
collides with the proxy cert name and becomes a real clash if #3 above is
adopted. Change to `tls-standalone`.
- **Plain HTTP leaks with TLS enabled.** `standalone-service.yaml` always
publishes port `8080`, and `standalone-configmap.yaml` only sets
`PULSAR_PREFIX_webServicePortTls` / `PULSAR_PREFIX_brokerServicePortTls` — it
doesn't null the non-TLS `webServicePort` / `brokerServicePort`. Both listeners
end up active. Either set `PULSAR_PREFIX_webServicePort: ""` /
`PULSAR_PREFIX_brokerServicePort: ""` when TLS is on, or gate the Service ports
on `tls.enabled` (the way the broker templates do).
- **Functions + standalone has no RBAC.** `broker-rbac.yaml` is gated on
`not .Values.standalone.enabled`, but the standalone start script enables the
functions worker when `components.functions: true`. The K8s runtime will hit
RBAC failures. Either emit the functions RBAC bound to the standalone SA in
that combination, or `fail` on it.
- **CI `--timeout` bump applies globally.** `.ci/helm.sh` raises the install
timeout from 6m → 10m for *every* job, not just standalone. If standalone
specifically needs more time, scope the bump; otherwise document why the
blanket increase is needed.
- **`set -euo pipefail` in `kind-cluster-build.sh`** is a good drive-by fix
but unrelated — worth calling out in the PR body (already done — thanks).
Happy to look at a follow-up commit once you've had a chance to triage these.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]