This is an automated email from the ASF dual-hosted git repository.
lhotari pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/pulsar-site.git
The following commit(s) were added to refs/heads/main by this push:
new a7cbfbc30f4 Update bare metal and docker deployment config (#1098)
a7cbfbc30f4 is described below
commit a7cbfbc30f4bd0d0f0b33685e28f733d428ad1c1
Author: zhou zhuohan <[email protected]>
AuthorDate: Wed Apr 22 00:46:29 2026 +0800
Update bare metal and docker deployment config (#1098)
---
docs/deploy-bare-metal.md | 183 ++++++++++++++++++++--
docs/deploy-docker.md | 140 ++++++++++++++++-
versioned_docs/version-3.0.x/deploy-bare-metal.md | 174 +++++++++++++++++++-
versioned_docs/version-3.0.x/deploy-docker.md | 146 ++++++++++++++++-
versioned_docs/version-4.0.x/deploy-bare-metal.md | 183 ++++++++++++++++++++--
versioned_docs/version-4.0.x/deploy-docker.md | 140 ++++++++++++++++-
versioned_docs/version-4.2.x/deploy-bare-metal.md | 183 ++++++++++++++++++++--
versioned_docs/version-4.2.x/deploy-docker.md | 140 ++++++++++++++++-
8 files changed, 1217 insertions(+), 72 deletions(-)
diff --git a/docs/deploy-bare-metal.md b/docs/deploy-bare-metal.md
index 95ee9028335..27957a31df6 100644
--- a/docs/deploy-bare-metal.md
+++ b/docs/deploy-bare-metal.md
@@ -121,6 +121,16 @@ Directory | Contains
`lib` | The [JAR](https://en.wikipedia.org/wiki/JAR_(file_format)) files that
Pulsar uses
`logs` | Logs that the installation creates
+The `conf` directory contains configuration files for various Pulsar
components. Below is a brief overview of the main configuration categories:
+
+- **JVM Configuration** (`pulsar_env.sh` / `bkenv.sh`): Controls JVM memory
allocation (`PULSAR_MEM`, `BOOKIE_MEM`), garbage collection options
(`PULSAR_GC`, `BOOKIE_GC`), and extra JVM options (`PULSAR_EXTRA_OPTS`,
`BOOKIE_EXTRA_OPTS`) for Broker, BookKeeper, and other components.
+- **Broker Configuration** (`broker.conf`): Core runtime parameters for the
Pulsar Broker, including metadata store connection, cluster name, ports,
message retention policies, authentication, and authorization settings.
+- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters
for BookKeeper Bookies, including journal and ledger directories, ZooKeeper
connection, compaction, and disk usage thresholds.
+- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings
including log levels, output format, file rolling strategies, and log output
directories.
+- **Dynamic Configuration**: Some Broker configuration properties can be
updated at runtime without restarting the service, using the `pulsar-admin` CLI
tool or the Admin REST API. Dynamic configurations are stored in the metadata
store (ZooKeeper) and take effect across all Brokers in the cluster.
+
+For a complete list of all available configuration properties, see the [Pulsar
Configuration Reference](https://pulsar.apache.org/reference/#/next/).
+
### Install Built-in Connectors (optional)
To use `built-in` connectors, you need to download the connectors tarball
release on every broker node in one of the following ways :
@@ -297,19 +307,116 @@ You can obtain the metadata service URI of the existing
BookKeeper cluster by us
[BookKeeper](https://bookkeeper.apache.org) handles all persistent data
storage in Pulsar. You need to deploy a cluster of BookKeeper bookies to use
Pulsar. You can choose to run a **3-bookie BookKeeper cluster**.
+### Configure BookKeeper
+
+BookKeeper configuration is split across two files:
+
+- **`conf/bookkeeper.conf`**: Contains all BookKeeper runtime parameters,
including metadata store connection, storage directories, compaction settings,
and disk usage thresholds.
+- **`conf/bkenv.sh`**: Contains JVM-related parameters for the Bookie process,
including memory allocation (`BOOKIE_MEM`), garbage collection options
(`BOOKIE_GC`), and extra JVM flags (`BOOKIE_EXTRA_OPTS`).
+
+#### Metadata store connection
+
You can configure BookKeeper bookies using the
[`conf/bookkeeper.conf`](reference-configuration.md#bookkeeper) configuration
file. The most important step in configuring bookies for our purposes here is
ensuring that `metadataServiceUri` is set to the URI for the ZooKeeper cluster.
The following is an example:
```properties
metadataServiceUri=zk://zk1.us-west.example.com:2181;zk2.us-west.example.com:2181;zk3.us-west.example.com:2181/ledgers
```
-Which using `;` as separator in `metadataServiceUri`
+:::note
+
+Use `;` as the separator in `metadataServiceUri`.
:::
-Once you appropriately modify the `metadataServiceUri` parameter, you can make
any other configuration changes that you require. You can find a full listing
of the available BookKeeper configuration parameters
[here](reference-configuration.md#bookkeeper). However, consulting the
[BookKeeper
documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a
more in-depth guide might be a better choice.
+For more information about ZooKeeper and BookKeeper administration, see
[ZooKeeper and BookKeeper
administration](https://pulsar.apache.org/docs/next/administration-zk-bk/).
+
+#### Storage directories
+
+In a production environment, you should configure dedicated disks for journal
and ledger storage. Keeping them on separate disks significantly improves write
performance.
+
+```properties
+# WAL (Write-Ahead Log) directory — use a dedicated SSD for low-latency writes
+journalDirectory=/data/bookkeeper/journal
+
+# Ledger storage directory — use a separate disk from the journal
+ledgerDirectories=/data/bookkeeper/ledgers
+```
+
+- `journalDirectory`: Defaults to `data/bookkeeper/journal`. The journal is a
write-ahead log that records every write before it is applied to the ledger
storage. Using a dedicated high-speed SSD for the journal directory is critical
for write latency.
+- `ledgerDirectories`: Defaults to `data/bookkeeper/ledgers`. This is where
the actual ledger data is stored. Separating it from the journal directory
avoids I/O contention and improves throughput.
+
+#### GC and Compaction
+
+BookKeeper writes entries from multiple ledgers into shared Entry Log files
(default max 1 GB each, controlled by `logSizeLimit`). When ledgers are deleted
— for example, after Pulsar's retention policy trims expired data — the Entry
Log files that contained those ledgers develop unused space. The Bookie's GC
thread periodically scans for deleted ledgers and triggers compaction to
reclaim disk space by rewriting the remaining valid entries into new files.
+
+BookKeeper provides two levels of compaction:
+
+- **Minor Compaction**: Targets Entry Log files where the valid data ratio is
below `minorCompactionThreshold` (default 0.2, i.e., 20%). Runs at
`minorCompactionInterval` (default: every hour). Designed to quickly reclaim
heavily fragmented files.
+- **Major Compaction**: Targets Entry Log files where the valid data ratio is
below `majorCompactionThreshold` (default 0.5, i.e., 50%). Runs at
`majorCompactionInterval` (default: every day). Covers a wider range of files
with moderate fragmentation.
+
+```properties
+# GC scan interval (ms), default: 900000 (15 min)
+gcWaitTime=900000
+
+# Minor Compaction: threshold and interval
+minorCompactionThreshold=0.2
+minorCompactionInterval=3600
+
+# Major Compaction: threshold and interval
+majorCompactionThreshold=0.5
+majorCompactionInterval=86400
+```
+
+:::note
+
+`minorCompactionInterval` and `majorCompactionInterval` must be greater than
`gcWaitTime`, otherwise compaction will not run.
+
+:::
+
+#### Disk usage thresholds
+
+BookKeeper monitors disk usage and can automatically switch a Bookie to
read-only mode to prevent disk exhaustion.
+
+```properties
+# Bookie enters read-only mode when disk usage exceeds this threshold
(default: 0.95)
+diskUsageThreshold=0.95
+
+# Warning threshold — Major Compaction is paused when disk usage exceeds this
value (default: 0.90)
+diskUsageWarnThreshold=0.90
+
+# Low water mark — Bookie returns to read-write mode only after disk usage
drops below this value
+# Set it lower than diskUsageWarnThreshold to avoid frequent mode switching
(recommended: 0.87)
+diskUsageLwmThreshold=0.87
+```
+
+#### JVM configuration (bkenv.sh)
+
+The `conf/bkenv.sh` file controls JVM parameters for the Bookie process:
+
+- `BOOKIE_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g`. Adjust
based on your storage workload. Insufficient heap memory leads to frequent GC,
which increases write and read latency — especially under high throughput, GC
pauses can cause write timeouts. Direct memory is primarily used for Netty
ByteBuf allocation — BookKeeper defaults to the `PooledDirect` memory
allocator, which allocates all ByteBuf from direct memory for network I/O and
internal data handling.
+
+ ```bash
+ # Example: increase heap and direct memory for high-throughput workloads
+ BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g"
+ ```
+
+- `BOOKIE_EXTRA_OPTS`: Passes additional JVM flags to the Bookie process.
Examples:
+
+ ```bash
+ # Enable heap dump on OOM (the default script only enables
ExitOnOutOfMemoryError,
+ # without a heap dump file you cannot diagnose the root cause)
+ BOOKIE_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/logs/bookie/heapdump.hprof"
+
+ # Temporarily enable Netty leak detection for troubleshooting off-heap
memory leaks
+ # (default is disabled; set to advanced level when investigating)
+ BOOKIE_EXTRA_OPTS="-Dio.netty.leakDetection.level=advanced"
+ ```
+
+After you finish editing both `conf/bookkeeper.conf` and `conf/bkenv.sh`, you
can find a full listing of the available BookKeeper configuration parameters
[here](reference-configuration.md#bookkeeper). However, consulting the
[BookKeeper
documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a
more in-depth guide might be a better choice.
-Once you apply the desired configuration in `conf/bookkeeper.conf`, you can
start up a bookie on each of your BookKeeper hosts. You can start up each
bookie either in the background, using
[nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
+### Start BookKeepers
+
+With the desired configuration applied in `conf/bookkeeper.conf` and
`conf/bkenv.sh`, you can start up a bookie on each of your BookKeeper hosts.
You can start up each bookie either in the background, using
[nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
To start the bookie in the background, use the
[`pulsar-daemon`](reference-cli-tools.md) CLI tool:
@@ -346,6 +453,13 @@ Pulsar brokers are the last thing you need to deploy in
your Pulsar cluster. Bro
### Configure Brokers
+Broker configuration is split across two files:
+
+- **`conf/broker.conf`**: Contains all Broker runtime parameters, including
metadata store connection, cluster name, ports, replication settings, and
feature toggles.
+- **`conf/pulsar_env.sh`**: Contains JVM-related parameters for the Broker
process, including memory allocation (`PULSAR_MEM`), garbage collection options
(`PULSAR_GC`), and extra JVM flags (`PULSAR_EXTRA_OPTS`).
+
+#### Metadata store and cluster settings
+
You can configure brokers using the `conf/broker.conf` configuration file. The
most important element of broker configuration is ensuring that each broker is
aware of the ZooKeeper cluster that you have deployed. Ensure that the
[`metadataStoreUrl`](reference-configuration.md#broker) and
[`configurationMetadataStoreUrl`](reference-configuration.md#broker) parameters
are correct. In this case, since you only have 1 cluster and no configuration
store setup, the `configurationMetadataStoreU [...]
```properties
@@ -368,18 +482,57 @@ webServicePort=8080
webServicePortTls=8443
```
-> If you deploy Pulsar in a one-node cluster, you should update the
replication settings in `conf/broker.conf` to `1`.
->
-> ```properties
-> # Number of bookies to use when creating a ledger
-> managedLedgerDefaultEnsembleSize=1
->
-> # Number of copies to store for each message
-> managedLedgerDefaultWriteQuorum=1
->
-> # Number of guaranteed copies (acks to wait before write is complete)
-> managedLedgerDefaultAckQuorum=1
-> ```
+#### Managed ledger settings
+
+These parameters control how the Broker creates BookKeeper ledgers for message
storage. They map to the BookKeeper protocol's [Ensemble / Write Quorum / Ack
Quorum](https://bookkeeper.apache.org/docs/getting-started/concepts/#ledgers)
model:
+
+```properties
+# Ensemble size (E): number of bookies to use when creating a ledger (default:
2)
+managedLedgerDefaultEnsembleSize=2
+
+# Write quorum (Qw): number of copies to store for each entry (default: 2)
+managedLedgerDefaultWriteQuorum=2
+
+# Ack quorum (Qa): number of acks to wait before a write is considered
complete (default: 2)
+managedLedgerDefaultAckQuorum=2
+```
+
+The invariant **E ≥ Qw ≥ Qa** must hold; otherwise ledger creation will fail.
+
+> If you deploy Pulsar in a one-node cluster, you should set all three values
to `1`.
+
+#### JVM configuration (pulsar_env.sh)
+
+The `conf/pulsar_env.sh` file controls JVM parameters for the Broker process:
+
+- `PULSAR_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g`. Adjust
based on your machine's available memory. Insufficient heap memory leads to
frequent GC, and GC pauses increase message publish and consume latency — in
severe cases, Full GC can make the Broker temporarily unavailable. Direct
memory is critical for the Broker's message caching and Netty I/O operations.
+
+ ```bash
+ # Example: increase heap and direct memory for production workloads
+ PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"
+ ```
+
+- `PULSAR_EXTRA_OPTS`: Passes additional JVM flags to the
Broker/Proxy/ZooKeeper process. Since `PULSAR_EXTRA_OPTS` is appended after
other JVM options on the command line, it can also be used to **override**
existing JVM parameters defined in `pulsar_env.sh` or the `bin/pulsar` startup
script (later flags take precedence). Examples:
+
+ ```bash
+ # Enable heap dump on OOM (the default script only enables
ExitOnOutOfMemoryError,
+ # without a heap dump file you cannot diagnose the root cause after the
process exits)
+ PULSAR_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/logs/pulsar/heapdump.hprof"
+
+ # Enable IPv6 support (the default script sets
-Djava.net.preferIPv4Stack=true;
+ # override this if your deployment uses IPv6 networking)
+ PULSAR_EXTRA_OPTS="-Djava.net.preferIPv4Stack=false"
+
+ # Tune Netty memory pool parameters (increase maxOrder and
maxCachedBufferCapacity
+ # if your messages are large, to avoid Netty bypassing the memory pool for
allocation)
+ PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13
-Dio.netty.allocator.numDirectArenas=8
-Dio.netty.allocator.maxCachedBufferCapacity=1048576"
+ ```
+
+:::tip
+
+You can also refer to the default configuration in the [Pulsar Helm Chart
values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml)
as a tuning reference.
+
+:::
### Enable Pulsar Functions (optional)
diff --git a/docs/deploy-docker.md b/docs/deploy-docker.md
index 9e82f616d80..f03a2732c1f 100644
--- a/docs/deploy-docker.md
+++ b/docs/deploy-docker.md
@@ -34,9 +34,7 @@ Create a ZooKeeper container and start the ZooKeeper service.
```bash
docker run -d -p 2181:2181 --net=pulsar \
-e metadataStoreUrl=zk:zookeeper:2181 \
- -e cluster-name=cluster-a -e managedLedgerDefaultEnsembleSize=1 \
- -e managedLedgerDefaultWriteQuorum=1 \
- -e managedLedgerDefaultAckQuorum=1 \
+ -e cluster-name=cluster-a \
-v $(pwd)/data/zookeeper:/pulsar/data/zookeeper \
--name zookeeper --hostname zookeeper \
apachepulsar/pulsar-all:latest \
@@ -81,10 +79,142 @@ docker run -d -p 6650:6650 -p 8080:8080 --net=pulsar \
-e metadataStoreUrl=zk:zookeeper:2181 \
-e zookeeperServers=zookeeper:2181 \
-e clusterName=cluster-a \
- -e managedLedgerDefaultEnsembleSize=1 \
- -e managedLedgerDefaultWriteQuorum=1 \
+ --name broker --hostname broker \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+## Step 4: Configuration overview
+
+Pulsar Docker images support the following configuration categories:
+
+- **JVM Configuration**: Controls JVM memory allocation and garbage collection
for Broker and BookKeeper processes. In Docker, JVM parameters are set via
environment variables such as `PULSAR_MEM` and `BOOKIE_MEM`.
+- **Broker Configuration** (`broker.conf`): Core runtime parameters for the
Pulsar Broker, including metadata store connection, cluster name, ports, and
message replication settings.
+- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters
for BookKeeper Bookies, including journal and ledger directories, compaction,
and disk usage thresholds.
+- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings
including log levels, output format, and file rolling strategies.
+- **Dynamic Configuration**: Some Broker configuration properties can be
updated at runtime without restarting the container, using the `pulsar-admin`
CLI tool or the Admin REST API.
+
+For a complete list of all available configuration properties, see the [Pulsar
Configuration Reference](https://pulsar.apache.org/reference/#/next/).
+
+### How Docker configuration works
+
+Pulsar Docker images include a Python script called `apply-config-from-env.py`
that runs before the main process starts. This script reads all environment
variables and maps them directly to configuration file properties:
+
+1. If an environment variable name matches a key in the built-in configuration
file shipped with the container (e.g., `broker.conf` or `bookkeeper.conf`), the
script updates that key's value.
+2. Environment variables prefixed with `PULSAR_PREFIX_` are also supported —
the prefix is stripped and the remaining name is used as the configuration key.
This is useful when the configuration key conflicts with existing system
environment variables. Using `PULSAR_PREFIX_` is necessary for configuration
keys that aren't available in the shipped configuration files, but are
supported by the component (for example, keys available in Pulsar's
[ServiceConfiguration](https://github.com/apac [...]
+
+For example, setting `-e managedLedgerDefaultEnsembleSize=2` will update the
`managedLedgerDefaultEnsembleSize` property in the target configuration file.
+
+### Configuration methods
+
+#### Method 1: Using `-e` environment variables
+
+Pass configuration properties directly as environment variables in the `docker
run` command:
+
+```bash
+docker run -d \
+ -e metadataStoreUrl=zk:zookeeper:2181 \
+ -e clusterName=cluster-a \
+ -e managedLedgerDefaultEnsembleSize=2 \
+ -e managedLedgerDefaultWriteQuorum=2 \
+ -e managedLedgerDefaultAckQuorum=2 \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+#### Method 2: Using `--env-file` for batch loading
+
+For a large number of configuration properties, use an environment file to
keep your `docker run` command clean:
+
+```bash
+docker run -d --env-file ./broker-config.env \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+Example `broker-config.env` file:
+
+```properties
+metadataStoreUrl=zk:zookeeper:2181
+clusterName=cluster-a
+managedLedgerDefaultEnsembleSize=2
+managedLedgerDefaultWriteQuorum=2
+managedLedgerDefaultAckQuorum=2
+PULSAR_MEM=-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g
+```
+
+#### Method 3: Using Docker Volume to mount custom configuration files
+
+You can mount a custom configuration file from the host into the container,
bypassing the environment variable mechanism entirely:
+
+```bash
+docker run -d \
+ -e PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g" \
+ -v $(pwd)/my-broker.conf:/pulsar/conf/broker.conf \
+ apachepulsar/pulsar-all:latest \
+ bin/pulsar broker
+```
+
+### Common configuration examples
+
+Below are examples of commonly used configuration properties for BookKeeper
and Broker containers. You can add more `-e` flags to the `docker run` command
shown in [Step 3](#step-3-create-and-start-containers) to customize the
behavior.
+
+#### BookKeeper
+
+```bash
+docker run -d -e clusterName=cluster-a \
+ -e zkServers=zookeeper:2181 --net=pulsar \
+ -e metadataServiceUri=metadata-store:zk:zookeeper:2181 \
+ # Storage directories: journal for write-ahead logs, ledgers for actual
message data
+ -e journalDirectory=/pulsar/data/bookkeeper/journal \
+ -e ledgerDirectories=/pulsar/data/bookkeeper/ledgers \
+ # Disk usage thresholds: bookie will reject writes when usage exceeds
these limits
+ -e diskUsageThreshold=0.95 \
+ -e diskUsageWarnThreshold=0.90 \
+ -e diskUsageLwmThreshold=0.87 \
+ # GC and Compaction: reclaim disk space by removing unused ledger data
+ -e gcWaitTime=900000 \
+ -e minorCompactionThreshold=0.2 \
+ -e minorCompactionInterval=3600 \
+ -e majorCompactionThreshold=0.5 \
+ -e majorCompactionInterval=86400 \
+ # JVM memory
+ -e BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g" \
+ # Extra JVM options: appended to JVM flags in the startup script, can
override default JVM parameters
+ -e BOOKIE_EXTRA_OPTS="-XX:+ExitOnOutOfMemoryError" \
+ # Volume mounts: if possible, use separate physical disks for journal and
ledger to improve read/write performance
+ -v $(pwd)/data/bookkeeper/journal:/pulsar/data/bookkeeper/journal \
+ -v $(pwd)/data/bookkeeper/ledgers:/pulsar/data/bookkeeper/ledgers \
+ --name bookie --hostname bookie \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/bookkeeper.conf && exec
bin/pulsar bookie"
+```
+
+#### Broker
+
+```bash
+docker run -d -p 6650:6650 -p 8080:8080 --net=pulsar \
+ -e metadataStoreUrl=zk:zookeeper:2181 \
+ -e zookeeperServers=zookeeper:2181 \
+ -e clusterName=cluster-a \
+ # Ensemble settings: control how messages are replicated across bookies
(must not exceed the number of deployed bookies)
+ -e managedLedgerDefaultEnsembleSize=2 \
+ -e managedLedgerDefaultWriteQuorum=2 \
-e managedLedgerDefaultAckQuorum=1 \
+ # Ports: binary protocol port and HTTP admin port
+ -e brokerServicePort=6650 \
+ -e webServicePort=8080 \
+ # JVM memory
+ -e PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g" \
+ # Extra JVM options: appended to JVM flags in the startup script, can
override default JVM parameters
+ -e PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13
-Dio.netty.allocator.numDirectArenas=8
-Dio.netty.allocator.maxCachedBufferCapacity=1048576" \
--name broker --hostname broker \
apachepulsar/pulsar-all:latest \
bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
```
+
+:::tip
+
+You can also refer to the default configuration in the [Pulsar Helm Chart
values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml)
as a tuning reference.
+
+:::
diff --git a/versioned_docs/version-3.0.x/deploy-bare-metal.md
b/versioned_docs/version-3.0.x/deploy-bare-metal.md
index 89cdec74d78..f4f51df9bca 100644
--- a/versioned_docs/version-3.0.x/deploy-bare-metal.md
+++ b/versioned_docs/version-3.0.x/deploy-bare-metal.md
@@ -125,7 +125,17 @@ Directory | Contains
`lib` | The [JAR](https://en.wikipedia.org/wiki/JAR_(file_format)) files that
Pulsar uses
`logs` | Logs that the installation creates
-## Install Built-in Connectors (optional)
+The `conf` directory contains configuration files for various Pulsar
components. Below is a brief overview of the main configuration categories:
+
+- **JVM Configuration** (`pulsar_env.sh` / `bkenv.sh`): Controls JVM memory
allocation (`PULSAR_MEM`, `BOOKIE_MEM`), garbage collection options
(`PULSAR_GC`, `BOOKIE_GC`), and extra JVM options (`PULSAR_EXTRA_OPTS`,
`BOOKIE_EXTRA_OPTS`) for Broker, BookKeeper, and other components.
+- **Broker Configuration** (`broker.conf`): Core runtime parameters for the
Pulsar Broker, including metadata store connection, cluster name, ports,
message retention policies, authentication, and authorization settings.
+- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters
for BookKeeper Bookies, including journal and ledger directories, ZooKeeper
connection, compaction, and disk usage thresholds.
+- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings
including log levels, output format, file rolling strategies, and log output
directories.
+- **Dynamic Configuration**: Some Broker configuration properties can be
updated at runtime without restarting the service, using the `pulsar-admin` CLI
tool or the Admin REST API. Dynamic configurations are stored in the metadata
store (ZooKeeper) and take effect across all Brokers in the cluster.
+
+For a complete list of all available configuration properties, see the [Pulsar
Configuration Reference](https://pulsar.apache.org/reference/#/next/).
+
+### Install Built-in Connectors (optional)
To use `built-in` connectors, you need to download the connectors tarball
release on every broker node in one of the following ways :
@@ -301,17 +311,116 @@ You can obtain the metadata service URI of the existing
BookKeeper cluster by us
[BookKeeper](https://bookkeeper.apache.org) handles all persistent data
storage in Pulsar. You need to deploy a cluster of BookKeeper bookies to use
Pulsar. You can choose to run a **3-bookie BookKeeper cluster**.
+### Configure BookKeeper
+
+BookKeeper configuration is split across two files:
+
+- **`conf/bookkeeper.conf`**: Contains all BookKeeper runtime parameters,
including metadata store connection, storage directories, compaction settings,
and disk usage thresholds.
+- **`conf/bkenv.sh`**: Contains JVM-related parameters for the Bookie process,
including memory allocation (`BOOKIE_MEM`), garbage collection options
(`BOOKIE_GC`), and extra JVM flags (`BOOKIE_EXTRA_OPTS`).
+
+#### Metadata store connection
+
You can configure BookKeeper bookies using the
[`conf/bookkeeper.conf`](reference-configuration.md#bookkeeper) configuration
file. The most important step in configuring bookies for our purposes here is
ensuring that `metadataServiceUri` is set to the URI for the ZooKeeper cluster.
The following is an example:
```properties
metadataServiceUri=zk://zk1.us-west.example.com:2181;zk2.us-west.example.com:2181;zk3.us-west.example.com:2181/ledgers
```
-Which using `;` as separator in `metadataServiceUri`
+:::note
+
+Use `;` as the separator in `metadataServiceUri`.
+
+:::
+
+For more information about ZooKeeper and BookKeeper administration, see
[ZooKeeper and BookKeeper
administration](https://pulsar.apache.org/docs/next/administration-zk-bk/).
+
+#### Storage directories
+
+In a production environment, you should configure dedicated disks for journal
and ledger storage. Keeping them on separate disks significantly improves write
performance.
+
+```properties
+# WAL (Write-Ahead Log) directory — use a dedicated SSD for low-latency writes
+journalDirectory=/data/bookkeeper/journal
+
+# Ledger storage directory — use a separate disk from the journal
+ledgerDirectories=/data/bookkeeper/ledgers
+```
+
+- `journalDirectory`: Defaults to `data/bookkeeper/journal`. The journal is a
write-ahead log that records every write before it is applied to the ledger
storage. Using a dedicated high-speed SSD for the journal directory is critical
for write latency.
+- `ledgerDirectories`: Defaults to `data/bookkeeper/ledgers`. This is where
the actual ledger data is stored. Separating it from the journal directory
avoids I/O contention and improves throughput.
+
+#### GC and Compaction
+
+BookKeeper writes entries from multiple ledgers into shared Entry Log files
(default max 1 GB each, controlled by `logSizeLimit`). When ledgers are deleted
— for example, after Pulsar's retention policy trims expired data — the Entry
Log files that contained those ledgers develop unused space. The Bookie's GC
thread periodically scans for deleted ledgers and triggers compaction to
reclaim disk space by rewriting the remaining valid entries into new files.
+
+BookKeeper provides two levels of compaction:
+
+- **Minor Compaction**: Targets Entry Log files where the valid data ratio is
below `minorCompactionThreshold` (default 0.2, i.e., 20%). Runs at
`minorCompactionInterval` (default: every hour). Designed to quickly reclaim
heavily fragmented files.
+- **Major Compaction**: Targets Entry Log files where the valid data ratio is
below `majorCompactionThreshold` (default 0.5, i.e., 50%). Runs at
`majorCompactionInterval` (default: every day). Covers a wider range of files
with moderate fragmentation.
+
+```properties
+# GC scan interval (ms), default: 900000 (15 min)
+gcWaitTime=900000
+
+# Minor Compaction: threshold and interval
+minorCompactionThreshold=0.2
+minorCompactionInterval=3600
+
+# Major Compaction: threshold and interval
+majorCompactionThreshold=0.5
+majorCompactionInterval=86400
+```
+
+:::note
+
+`minorCompactionInterval` and `majorCompactionInterval` must be greater than
`gcWaitTime`, otherwise compaction will not run.
+
+:::
+
+#### Disk usage thresholds
+
+BookKeeper monitors disk usage and can automatically switch a Bookie to
read-only mode to prevent disk exhaustion.
+
+```properties
+# Bookie enters read-only mode when disk usage exceeds this threshold
(default: 0.95)
+diskUsageThreshold=0.95
+
+# Warning threshold — Major Compaction is paused when disk usage exceeds this
value (default: 0.90)
+diskUsageWarnThreshold=0.90
+
+# Low water mark — Bookie returns to read-write mode only after disk usage
drops below this value
+# Set it lower than diskUsageWarnThreshold to avoid frequent mode switching
(recommended: 0.87)
+diskUsageLwmThreshold=0.87
+```
+
+#### JVM configuration (bkenv.sh)
+
+The `conf/bkenv.sh` file controls JVM parameters for the Bookie process:
+
+- `BOOKIE_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g`. Adjust
based on your storage workload. Insufficient heap memory leads to frequent GC,
which increases write and read latency — especially under high throughput, GC
pauses can cause write timeouts. Direct memory is primarily used for Netty
ByteBuf allocation — BookKeeper defaults to the `PooledDirect` memory
allocator, which allocates all ByteBuf from direct memory for network I/O and
internal data handling.
-Once you appropriately modify the `metadataServiceUri` parameter, you can make
any other configuration changes that you require. You can find a full listing
of the available BookKeeper configuration parameters
[here](reference-configuration.md#bookkeeper). However, consulting the
[BookKeeper
documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a
more in-depth guide might be a better choice.
+ ```bash
+ # Example: increase heap and direct memory for high-throughput workloads
+ BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g"
+ ```
+
+- `BOOKIE_EXTRA_OPTS`: Passes additional JVM flags to the Bookie process.
Examples:
+
+ ```bash
+ # Enable heap dump on OOM (the default script only enables
ExitOnOutOfMemoryError,
+ # without a heap dump file you cannot diagnose the root cause)
+ BOOKIE_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/logs/bookie/heapdump.hprof"
+
+ # Temporarily enable Netty leak detection for troubleshooting off-heap
memory leaks
+ # (default is disabled; set to advanced level when investigating)
+ BOOKIE_EXTRA_OPTS="-Dio.netty.leakDetection.level=advanced"
+ ```
-Once you apply the desired configuration in `conf/bookkeeper.conf`, you can
start up a bookie on each of your BookKeeper hosts. You can start up each
bookie either in the background, using
[nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
+After you finish editing both `conf/bookkeeper.conf` and `conf/bkenv.sh`, you
can find a full listing of the available BookKeeper configuration parameters
[here](reference-configuration.md#bookkeeper). However, consulting the
[BookKeeper
documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a
more in-depth guide might be a better choice.
+
+### Start BookKeepers
+
+With the desired configuration applied in `conf/bookkeeper.conf` and
`conf/bkenv.sh`, you can start up a bookie on each of your BookKeeper hosts.
You can start up each bookie either in the background, using
[nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
To start the bookie in the background, use the
[`pulsar-daemon`](reference-cli-tools.md) CLI tool:
@@ -348,6 +457,13 @@ Pulsar brokers are the last thing you need to deploy in
your Pulsar cluster. Bro
### Configure Brokers
+Broker configuration is split across two files:
+
+- **`conf/broker.conf`**: Contains all Broker runtime parameters, including
metadata store connection, cluster name, ports, replication settings, and
feature toggles.
+- **`conf/pulsar_env.sh`**: Contains JVM-related parameters for the Broker
process, including memory allocation (`PULSAR_MEM`), garbage collection options
(`PULSAR_GC`), and extra JVM flags (`PULSAR_EXTRA_OPTS`).
+
+#### Metadata store and cluster settings
+
You can configure brokers using the `conf/broker.conf` configuration file. The
most important element of broker configuration is ensuring that each broker is
aware of the ZooKeeper cluster that you have deployed. Ensure that the
[`metadataStoreUrl`](reference-configuration.md#broker) and
[`configurationMetadataStoreUrl`](reference-configuration.md#broker) parameters
are correct. In this case, since you only have 1 cluster and no configuration
store setup, the `configurationMetadataStoreU [...]
```properties
@@ -370,6 +486,23 @@ webServicePort=8080
webServicePortTls=8443
```
+#### Managed ledger settings
+
+These parameters control how the Broker creates BookKeeper ledgers for message
storage. They map to the BookKeeper protocol's [Ensemble / Write Quorum / Ack
Quorum](https://bookkeeper.apache.org/docs/getting-started/concepts/#ledgers)
model:
+
+```properties
+# Ensemble size (E): number of bookies to use when creating a ledger (default:
2)
+managedLedgerDefaultEnsembleSize=2
+
+# Write quorum (Qw): number of copies to store for each entry (default: 2)
+managedLedgerDefaultWriteQuorum=2
+
+# Ack quorum (Qa): number of acks to wait before a write is considered
complete (default: 2)
+managedLedgerDefaultAckQuorum=2
+```
+
+The invariant **E ≥ Qw ≥ Qa** must hold; otherwise ledger creation will fail.
+
> If you deploy Pulsar in a one-node cluster, you should update the
> replication settings in `conf/broker.conf` to `1`.
>
> ```properties
@@ -383,6 +516,39 @@ webServicePortTls=8443
> managedLedgerDefaultAckQuorum=1
> ```
+#### JVM configuration (pulsar_env.sh)
+
+The `conf/pulsar_env.sh` file controls JVM parameters for the Broker process:
+
+- `PULSAR_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g`. Adjust
based on your machine's available memory. Insufficient heap memory leads to
frequent GC, and GC pauses increase message publish and consume latency — in
severe cases, Full GC can make the Broker temporarily unavailable. Direct
memory is critical for the Broker's message caching and Netty I/O operations.
+
+ ```bash
+ # Example: increase heap and direct memory for production workloads
+ PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"
+ ```
+
+- `PULSAR_EXTRA_OPTS`: Passes additional JVM flags to the
Broker/Proxy/ZooKeeper process. Since `PULSAR_EXTRA_OPTS` is appended after
other JVM options on the command line, it can also be used to **override**
existing JVM parameters defined in `pulsar_env.sh` or the `bin/pulsar` startup
script (later flags take precedence). Examples:
+
+ ```bash
+ # Enable heap dump on OOM (the default script only enables
ExitOnOutOfMemoryError,
+ # without a heap dump file you cannot diagnose the root cause after the
process exits)
+ PULSAR_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/logs/pulsar/heapdump.hprof"
+
+ # Enable IPv6 support (the default script sets
-Djava.net.preferIPv4Stack=true;
+ # override this if your deployment uses IPv6 networking)
+ PULSAR_EXTRA_OPTS="-Djava.net.preferIPv4Stack=false"
+
+ # Tune Netty memory pool parameters (increase maxOrder and
maxCachedBufferCapacity
+ # if your messages are large, to avoid Netty bypassing the memory pool for
allocation)
+ PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13
-Dio.netty.allocator.numDirectArenas=8
-Dio.netty.allocator.maxCachedBufferCapacity=1048576"
+ ```
+
+:::tip
+
+You can also refer to the default configuration in the [Pulsar Helm Chart
values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml)
as a tuning reference.
+
+:::
+
### Enable Pulsar Functions (optional)
diff --git a/versioned_docs/version-3.0.x/deploy-docker.md
b/versioned_docs/version-3.0.x/deploy-docker.md
index 7cccffb74c9..9dde83795c6 100644
--- a/versioned_docs/version-3.0.x/deploy-docker.md
+++ b/versioned_docs/version-3.0.x/deploy-docker.md
@@ -9,7 +9,7 @@ To deploy a Pulsar cluster on Docker using Docker commands, you
need to complete
2. Create a network.
3. Create and start the ZooKeeper, bookie, and broker containers.
-## Pull a Pulsar image
+## Step 1: Pull a Pulsar image
To run Pulsar on Docker, you need to create a container for each Pulsar
component: ZooKeeper, bookie, and the broker. You can pull the images of
ZooKeeper and bookie separately on Docker Hub, and pull the Pulsar image for
the broker. You can also pull only one Pulsar image and create three containers
with this image. This tutorial takes the second option as an example.
@@ -18,7 +18,7 @@ You can pull a Pulsar image from Docker Hub with the
following command. If you d
docker pull apachepulsar/pulsar-all:latest
```
-## Create a network
+## Step 2: Create a network
To deploy a Pulsar cluster on Docker, you need to create a network and connect
the containers of ZooKeeper, bookie, and broker to this network.
Use the following command to create the network `pulsar`:
@@ -27,7 +27,7 @@ Use the following command to create the network `pulsar`:
docker network create pulsar
```
-## Create and start containers
+## Step 3: Create and start containers
### Create a ZooKeeper container
@@ -36,9 +36,7 @@ Create a ZooKeeper container and start the ZooKeeper service.
```bash
docker run -d -p 2181:2181 --net=pulsar \
-e metadataStoreUrl=zk:zookeeper:2181 \
- -e cluster-name=cluster-a -e managedLedgerDefaultEnsembleSize=1 \
- -e managedLedgerDefaultWriteQuorum=1 \
- -e managedLedgerDefaultAckQuorum=1 \
+ -e cluster-name=cluster-a \
-v $(pwd)/data/zookeeper:/pulsar/data/zookeeper \
--name zookeeper --hostname zookeeper \
apachepulsar/pulsar-all:latest \
@@ -83,10 +81,142 @@ docker run -d -p 6650:6650 -p 8080:8080 --net=pulsar \
-e metadataStoreUrl=zk:zookeeper:2181 \
-e zookeeperServers=zookeeper:2181 \
-e clusterName=cluster-a \
- -e managedLedgerDefaultEnsembleSize=1 \
- -e managedLedgerDefaultWriteQuorum=1 \
+ --name broker --hostname broker \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+## Step 4: Configuration overview
+
+Pulsar Docker images support the following configuration categories:
+
+- **JVM Configuration**: Controls JVM memory allocation and garbage collection
for Broker and BookKeeper processes. In Docker, JVM parameters are set via
environment variables such as `PULSAR_MEM` and `BOOKIE_MEM`.
+- **Broker Configuration** (`broker.conf`): Core runtime parameters for the
Pulsar Broker, including metadata store connection, cluster name, ports, and
message replication settings.
+- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters
for BookKeeper Bookies, including journal and ledger directories, compaction,
and disk usage thresholds.
+- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings
including log levels, output format, and file rolling strategies.
+- **Dynamic Configuration**: Some Broker configuration properties can be
updated at runtime without restarting the container, using the `pulsar-admin`
CLI tool or the Admin REST API.
+
+For a complete list of all available configuration properties, see the [Pulsar
Configuration Reference](https://pulsar.apache.org/reference/#/next/).
+
+### How Docker configuration works
+
+Pulsar Docker images include a Python script called `apply-config-from-env.py`
that runs before the main process starts. This script reads all environment
variables and maps them directly to configuration file properties:
+
+1. If an environment variable name matches a key in the built-in configuration
file shipped with the container (e.g., `broker.conf` or `bookkeeper.conf`), the
script updates that key's value.
+2. Environment variables prefixed with `PULSAR_PREFIX_` are also supported —
the prefix is stripped and the remaining name is used as the configuration key.
This is useful when the configuration key conflicts with existing system
environment variables. Using `PULSAR_PREFIX_` is necessary for configuration
keys that aren't available in the shipped configuration files, but are
supported by the component (for example, keys available in Pulsar's
[ServiceConfiguration](https://github.com/apac [...]
+
+For example, setting `-e managedLedgerDefaultEnsembleSize=2` will update the
`managedLedgerDefaultEnsembleSize` property in the target configuration file.
+
+### Configuration methods
+
+#### Method 1: Using `-e` environment variables
+
+Pass configuration properties directly as environment variables in the `docker
run` command:
+
+```bash
+docker run -d \
+ -e metadataStoreUrl=zk:zookeeper:2181 \
+ -e clusterName=cluster-a \
+ -e managedLedgerDefaultEnsembleSize=2 \
+ -e managedLedgerDefaultWriteQuorum=2 \
+ -e managedLedgerDefaultAckQuorum=2 \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+#### Method 2: Using `--env-file` for batch loading
+
+For a large number of configuration properties, use an environment file to
keep your `docker run` command clean:
+
+```bash
+docker run -d --env-file ./broker-config.env \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+Example `broker-config.env` file:
+
+```properties
+metadataStoreUrl=zk:zookeeper:2181
+clusterName=cluster-a
+managedLedgerDefaultEnsembleSize=2
+managedLedgerDefaultWriteQuorum=2
+managedLedgerDefaultAckQuorum=2
+PULSAR_MEM=-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g
+```
+
+#### Method 3: Using Docker Volume to mount custom configuration files
+
+You can mount a custom configuration file from the host into the container,
bypassing the environment variable mechanism entirely:
+
+```bash
+docker run -d \
+ -e PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g" \
+ -v $(pwd)/my-broker.conf:/pulsar/conf/broker.conf \
+ apachepulsar/pulsar-all:latest \
+ bin/pulsar broker
+```
+
+### Common configuration examples
+
+Below are examples of commonly used configuration properties for BookKeeper
and Broker containers. You can add more `-e` flags to the `docker run` command
shown in [Step 3](#step-3-create-and-start-containers) to customize the
behavior.
+
+#### BookKeeper
+
+```bash
+docker run -d -e clusterName=cluster-a \
+ -e zkServers=zookeeper:2181 --net=pulsar \
+ -e metadataServiceUri=metadata-store:zk:zookeeper:2181 \
+ # Storage directories: journal for write-ahead logs, ledgers for actual
message data
+ -e journalDirectory=/pulsar/data/bookkeeper/journal \
+ -e ledgerDirectories=/pulsar/data/bookkeeper/ledgers \
+ # Disk usage thresholds: bookie will reject writes when usage exceeds
these limits
+ -e diskUsageThreshold=0.95 \
+ -e diskUsageWarnThreshold=0.90 \
+ -e diskUsageLwmThreshold=0.87 \
+ # GC and Compaction: reclaim disk space by removing unused ledger data
+ -e gcWaitTime=900000 \
+ -e minorCompactionThreshold=0.2 \
+ -e minorCompactionInterval=3600 \
+ -e majorCompactionThreshold=0.5 \
+ -e majorCompactionInterval=86400 \
+ # JVM memory
+ -e BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g" \
+ # Extra JVM options: appended to JVM flags in the startup script, can
override default JVM parameters
+ -e BOOKIE_EXTRA_OPTS="-XX:+ExitOnOutOfMemoryError" \
+ # Volume mounts: if possible, use separate physical disks for journal and
ledger to improve read/write performance
+ -v $(pwd)/data/bookkeeper/journal:/pulsar/data/bookkeeper/journal \
+ -v $(pwd)/data/bookkeeper/ledgers:/pulsar/data/bookkeeper/ledgers \
+ --name bookie --hostname bookie \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/bookkeeper.conf && exec
bin/pulsar bookie"
+```
+
+#### Broker
+
+```bash
+docker run -d -p 6650:6650 -p 8080:8080 --net=pulsar \
+ -e metadataStoreUrl=zk:zookeeper:2181 \
+ -e zookeeperServers=zookeeper:2181 \
+ -e clusterName=cluster-a \
+ # Ensemble settings: control how messages are replicated across bookies
(must not exceed the number of deployed bookies)
+ -e managedLedgerDefaultEnsembleSize=2 \
+ -e managedLedgerDefaultWriteQuorum=2 \
-e managedLedgerDefaultAckQuorum=1 \
+ # Ports: binary protocol port and HTTP admin port
+ -e brokerServicePort=6650 \
+ -e webServicePort=8080 \
+ # JVM memory
+ -e PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g" \
+ # Extra JVM options: appended to JVM flags in the startup script, can
override default JVM parameters
+ -e PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13
-Dio.netty.allocator.numDirectArenas=8
-Dio.netty.allocator.maxCachedBufferCapacity=1048576" \
--name broker --hostname broker \
apachepulsar/pulsar-all:latest \
bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
```
+
+:::tip
+
+You can also refer to the default configuration in the [Pulsar Helm Chart
values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml)
as a tuning reference.
+
+:::
diff --git a/versioned_docs/version-4.0.x/deploy-bare-metal.md
b/versioned_docs/version-4.0.x/deploy-bare-metal.md
index 95ee9028335..27957a31df6 100644
--- a/versioned_docs/version-4.0.x/deploy-bare-metal.md
+++ b/versioned_docs/version-4.0.x/deploy-bare-metal.md
@@ -121,6 +121,16 @@ Directory | Contains
`lib` | The [JAR](https://en.wikipedia.org/wiki/JAR_(file_format)) files that
Pulsar uses
`logs` | Logs that the installation creates
+The `conf` directory contains configuration files for various Pulsar
components. Below is a brief overview of the main configuration categories:
+
+- **JVM Configuration** (`pulsar_env.sh` / `bkenv.sh`): Controls JVM memory
allocation (`PULSAR_MEM`, `BOOKIE_MEM`), garbage collection options
(`PULSAR_GC`, `BOOKIE_GC`), and extra JVM options (`PULSAR_EXTRA_OPTS`,
`BOOKIE_EXTRA_OPTS`) for Broker, BookKeeper, and other components.
+- **Broker Configuration** (`broker.conf`): Core runtime parameters for the
Pulsar Broker, including metadata store connection, cluster name, ports,
message retention policies, authentication, and authorization settings.
+- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters
for BookKeeper Bookies, including journal and ledger directories, ZooKeeper
connection, compaction, and disk usage thresholds.
+- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings
including log levels, output format, file rolling strategies, and log output
directories.
+- **Dynamic Configuration**: Some Broker configuration properties can be
updated at runtime without restarting the service, using the `pulsar-admin` CLI
tool or the Admin REST API. Dynamic configurations are stored in the metadata
store (ZooKeeper) and take effect across all Brokers in the cluster.
+
+For a complete list of all available configuration properties, see the [Pulsar
Configuration Reference](https://pulsar.apache.org/reference/#/next/).
+
### Install Built-in Connectors (optional)
To use `built-in` connectors, you need to download the connectors tarball
release on every broker node in one of the following ways :
@@ -297,19 +307,116 @@ You can obtain the metadata service URI of the existing
BookKeeper cluster by us
[BookKeeper](https://bookkeeper.apache.org) handles all persistent data
storage in Pulsar. You need to deploy a cluster of BookKeeper bookies to use
Pulsar. You can choose to run a **3-bookie BookKeeper cluster**.
+### Configure BookKeeper
+
+BookKeeper configuration is split across two files:
+
+- **`conf/bookkeeper.conf`**: Contains all BookKeeper runtime parameters,
including metadata store connection, storage directories, compaction settings,
and disk usage thresholds.
+- **`conf/bkenv.sh`**: Contains JVM-related parameters for the Bookie process,
including memory allocation (`BOOKIE_MEM`), garbage collection options
(`BOOKIE_GC`), and extra JVM flags (`BOOKIE_EXTRA_OPTS`).
+
+#### Metadata store connection
+
You can configure BookKeeper bookies using the
[`conf/bookkeeper.conf`](reference-configuration.md#bookkeeper) configuration
file. The most important step in configuring bookies for our purposes here is
ensuring that `metadataServiceUri` is set to the URI for the ZooKeeper cluster.
The following is an example:
```properties
metadataServiceUri=zk://zk1.us-west.example.com:2181;zk2.us-west.example.com:2181;zk3.us-west.example.com:2181/ledgers
```
-Which using `;` as separator in `metadataServiceUri`
+:::note
+
+Use `;` as the separator in `metadataServiceUri`.
:::
-Once you appropriately modify the `metadataServiceUri` parameter, you can make
any other configuration changes that you require. You can find a full listing
of the available BookKeeper configuration parameters
[here](reference-configuration.md#bookkeeper). However, consulting the
[BookKeeper
documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a
more in-depth guide might be a better choice.
+For more information about ZooKeeper and BookKeeper administration, see
[ZooKeeper and BookKeeper
administration](https://pulsar.apache.org/docs/next/administration-zk-bk/).
+
+#### Storage directories
+
+In a production environment, you should configure dedicated disks for journal
and ledger storage. Keeping them on separate disks significantly improves write
performance.
+
+```properties
+# WAL (Write-Ahead Log) directory — use a dedicated SSD for low-latency writes
+journalDirectory=/data/bookkeeper/journal
+
+# Ledger storage directory — use a separate disk from the journal
+ledgerDirectories=/data/bookkeeper/ledgers
+```
+
+- `journalDirectory`: Defaults to `data/bookkeeper/journal`. The journal is a
write-ahead log that records every write before it is applied to the ledger
storage. Using a dedicated high-speed SSD for the journal directory is critical
for write latency.
+- `ledgerDirectories`: Defaults to `data/bookkeeper/ledgers`. This is where
the actual ledger data is stored. Separating it from the journal directory
avoids I/O contention and improves throughput.
+
+#### GC and Compaction
+
+BookKeeper writes entries from multiple ledgers into shared Entry Log files
(default max 1 GB each, controlled by `logSizeLimit`). When ledgers are deleted
— for example, after Pulsar's retention policy trims expired data — the Entry
Log files that contained those ledgers develop unused space. The Bookie's GC
thread periodically scans for deleted ledgers and triggers compaction to
reclaim disk space by rewriting the remaining valid entries into new files.
+
+BookKeeper provides two levels of compaction:
+
+- **Minor Compaction**: Targets Entry Log files where the valid data ratio is
below `minorCompactionThreshold` (default 0.2, i.e., 20%). Runs at
`minorCompactionInterval` (default: every hour). Designed to quickly reclaim
heavily fragmented files.
+- **Major Compaction**: Targets Entry Log files where the valid data ratio is
below `majorCompactionThreshold` (default 0.5, i.e., 50%). Runs at
`majorCompactionInterval` (default: every day). Covers a wider range of files
with moderate fragmentation.
+
+```properties
+# GC scan interval (ms), default: 900000 (15 min)
+gcWaitTime=900000
+
+# Minor Compaction: threshold and interval
+minorCompactionThreshold=0.2
+minorCompactionInterval=3600
+
+# Major Compaction: threshold and interval
+majorCompactionThreshold=0.5
+majorCompactionInterval=86400
+```
+
+:::note
+
+`minorCompactionInterval` and `majorCompactionInterval` must be greater than
`gcWaitTime`, otherwise compaction will not run.
+
+:::
+
+#### Disk usage thresholds
+
+BookKeeper monitors disk usage and can automatically switch a Bookie to
read-only mode to prevent disk exhaustion.
+
+```properties
+# Bookie enters read-only mode when disk usage exceeds this threshold
(default: 0.95)
+diskUsageThreshold=0.95
+
+# Warning threshold — Major Compaction is paused when disk usage exceeds this
value (default: 0.90)
+diskUsageWarnThreshold=0.90
+
+# Low water mark — Bookie returns to read-write mode only after disk usage
drops below this value
+# Set it lower than diskUsageWarnThreshold to avoid frequent mode switching
(recommended: 0.87)
+diskUsageLwmThreshold=0.87
+```
+
+#### JVM configuration (bkenv.sh)
+
+The `conf/bkenv.sh` file controls JVM parameters for the Bookie process:
+
+- `BOOKIE_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g`. Adjust
based on your storage workload. Insufficient heap memory leads to frequent GC,
which increases write and read latency — especially under high throughput, GC
pauses can cause write timeouts. Direct memory is primarily used for Netty
ByteBuf allocation — BookKeeper defaults to the `PooledDirect` memory
allocator, which allocates all ByteBuf from direct memory for network I/O and
internal data handling.
+
+ ```bash
+ # Example: increase heap and direct memory for high-throughput workloads
+ BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g"
+ ```
+
+- `BOOKIE_EXTRA_OPTS`: Passes additional JVM flags to the Bookie process.
Examples:
+
+ ```bash
+ # Enable heap dump on OOM (the default script only enables
ExitOnOutOfMemoryError,
+ # without a heap dump file you cannot diagnose the root cause)
+ BOOKIE_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/logs/bookie/heapdump.hprof"
+
+ # Temporarily enable Netty leak detection for troubleshooting off-heap
memory leaks
+ # (default is disabled; set to advanced level when investigating)
+ BOOKIE_EXTRA_OPTS="-Dio.netty.leakDetection.level=advanced"
+ ```
+
+After you finish editing both `conf/bookkeeper.conf` and `conf/bkenv.sh`, you
can find a full listing of the available BookKeeper configuration parameters
[here](reference-configuration.md#bookkeeper). However, consulting the
[BookKeeper
documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a
more in-depth guide might be a better choice.
-Once you apply the desired configuration in `conf/bookkeeper.conf`, you can
start up a bookie on each of your BookKeeper hosts. You can start up each
bookie either in the background, using
[nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
+### Start BookKeepers
+
+With the desired configuration applied in `conf/bookkeeper.conf` and
`conf/bkenv.sh`, you can start up a bookie on each of your BookKeeper hosts.
You can start up each bookie either in the background, using
[nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
To start the bookie in the background, use the
[`pulsar-daemon`](reference-cli-tools.md) CLI tool:
@@ -346,6 +453,13 @@ Pulsar brokers are the last thing you need to deploy in
your Pulsar cluster. Bro
### Configure Brokers
+Broker configuration is split across two files:
+
+- **`conf/broker.conf`**: Contains all Broker runtime parameters, including
metadata store connection, cluster name, ports, replication settings, and
feature toggles.
+- **`conf/pulsar_env.sh`**: Contains JVM-related parameters for the Broker
process, including memory allocation (`PULSAR_MEM`), garbage collection options
(`PULSAR_GC`), and extra JVM flags (`PULSAR_EXTRA_OPTS`).
+
+#### Metadata store and cluster settings
+
You can configure brokers using the `conf/broker.conf` configuration file. The
most important element of broker configuration is ensuring that each broker is
aware of the ZooKeeper cluster that you have deployed. Ensure that the
[`metadataStoreUrl`](reference-configuration.md#broker) and
[`configurationMetadataStoreUrl`](reference-configuration.md#broker) parameters
are correct. In this case, since you only have 1 cluster and no configuration
store setup, the `configurationMetadataStoreU [...]
```properties
@@ -368,18 +482,57 @@ webServicePort=8080
webServicePortTls=8443
```
-> If you deploy Pulsar in a one-node cluster, you should update the
replication settings in `conf/broker.conf` to `1`.
->
-> ```properties
-> # Number of bookies to use when creating a ledger
-> managedLedgerDefaultEnsembleSize=1
->
-> # Number of copies to store for each message
-> managedLedgerDefaultWriteQuorum=1
->
-> # Number of guaranteed copies (acks to wait before write is complete)
-> managedLedgerDefaultAckQuorum=1
-> ```
+#### Managed ledger settings
+
+These parameters control how the Broker creates BookKeeper ledgers for message
storage. They map to the BookKeeper protocol's [Ensemble / Write Quorum / Ack
Quorum](https://bookkeeper.apache.org/docs/getting-started/concepts/#ledgers)
model:
+
+```properties
+# Ensemble size (E): number of bookies to use when creating a ledger (default:
2)
+managedLedgerDefaultEnsembleSize=2
+
+# Write quorum (Qw): number of copies to store for each entry (default: 2)
+managedLedgerDefaultWriteQuorum=2
+
+# Ack quorum (Qa): number of acks to wait before a write is considered
complete (default: 2)
+managedLedgerDefaultAckQuorum=2
+```
+
+The invariant **E ≥ Qw ≥ Qa** must hold; otherwise ledger creation will fail.
+
+> If you deploy Pulsar in a one-node cluster, you should set all three values
to `1`.
+
+#### JVM configuration (pulsar_env.sh)
+
+The `conf/pulsar_env.sh` file controls JVM parameters for the Broker process:
+
+- `PULSAR_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g`. Adjust
based on your machine's available memory. Insufficient heap memory leads to
frequent GC, and GC pauses increase message publish and consume latency — in
severe cases, Full GC can make the Broker temporarily unavailable. Direct
memory is critical for the Broker's message caching and Netty I/O operations.
+
+ ```bash
+ # Example: increase heap and direct memory for production workloads
+ PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"
+ ```
+
+- `PULSAR_EXTRA_OPTS`: Passes additional JVM flags to the
Broker/Proxy/ZooKeeper process. Since `PULSAR_EXTRA_OPTS` is appended after
other JVM options on the command line, it can also be used to **override**
existing JVM parameters defined in `pulsar_env.sh` or the `bin/pulsar` startup
script (later flags take precedence). Examples:
+
+ ```bash
+ # Enable heap dump on OOM (the default script only enables
ExitOnOutOfMemoryError,
+ # without a heap dump file you cannot diagnose the root cause after the
process exits)
+ PULSAR_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/logs/pulsar/heapdump.hprof"
+
+ # Enable IPv6 support (the default script sets
-Djava.net.preferIPv4Stack=true;
+ # override this if your deployment uses IPv6 networking)
+ PULSAR_EXTRA_OPTS="-Djava.net.preferIPv4Stack=false"
+
+ # Tune Netty memory pool parameters (increase maxOrder and
maxCachedBufferCapacity
+ # if your messages are large, to avoid Netty bypassing the memory pool for
allocation)
+ PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13
-Dio.netty.allocator.numDirectArenas=8
-Dio.netty.allocator.maxCachedBufferCapacity=1048576"
+ ```
+
+:::tip
+
+You can also refer to the default configuration in the [Pulsar Helm Chart
values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml)
as a tuning reference.
+
+:::
### Enable Pulsar Functions (optional)
diff --git a/versioned_docs/version-4.0.x/deploy-docker.md
b/versioned_docs/version-4.0.x/deploy-docker.md
index 9e82f616d80..f03a2732c1f 100644
--- a/versioned_docs/version-4.0.x/deploy-docker.md
+++ b/versioned_docs/version-4.0.x/deploy-docker.md
@@ -34,9 +34,7 @@ Create a ZooKeeper container and start the ZooKeeper service.
```bash
docker run -d -p 2181:2181 --net=pulsar \
-e metadataStoreUrl=zk:zookeeper:2181 \
- -e cluster-name=cluster-a -e managedLedgerDefaultEnsembleSize=1 \
- -e managedLedgerDefaultWriteQuorum=1 \
- -e managedLedgerDefaultAckQuorum=1 \
+ -e cluster-name=cluster-a \
-v $(pwd)/data/zookeeper:/pulsar/data/zookeeper \
--name zookeeper --hostname zookeeper \
apachepulsar/pulsar-all:latest \
@@ -81,10 +79,142 @@ docker run -d -p 6650:6650 -p 8080:8080 --net=pulsar \
-e metadataStoreUrl=zk:zookeeper:2181 \
-e zookeeperServers=zookeeper:2181 \
-e clusterName=cluster-a \
- -e managedLedgerDefaultEnsembleSize=1 \
- -e managedLedgerDefaultWriteQuorum=1 \
+ --name broker --hostname broker \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+## Step 4: Configuration overview
+
+Pulsar Docker images support the following configuration categories:
+
+- **JVM Configuration**: Controls JVM memory allocation and garbage collection
for Broker and BookKeeper processes. In Docker, JVM parameters are set via
environment variables such as `PULSAR_MEM` and `BOOKIE_MEM`.
+- **Broker Configuration** (`broker.conf`): Core runtime parameters for the
Pulsar Broker, including metadata store connection, cluster name, ports, and
message replication settings.
+- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters
for BookKeeper Bookies, including journal and ledger directories, compaction,
and disk usage thresholds.
+- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings
including log levels, output format, and file rolling strategies.
+- **Dynamic Configuration**: Some Broker configuration properties can be
updated at runtime without restarting the container, using the `pulsar-admin`
CLI tool or the Admin REST API.
+
+For a complete list of all available configuration properties, see the [Pulsar
Configuration Reference](https://pulsar.apache.org/reference/#/next/).
+
+### How Docker configuration works
+
+Pulsar Docker images include a Python script called `apply-config-from-env.py`
that runs before the main process starts. This script reads all environment
variables and maps them directly to configuration file properties:
+
+1. If an environment variable name matches a key in the built-in configuration
file shipped with the container (e.g., `broker.conf` or `bookkeeper.conf`), the
script updates that key's value.
+2. Environment variables prefixed with `PULSAR_PREFIX_` are also supported —
the prefix is stripped and the remaining name is used as the configuration key.
This is useful when the configuration key conflicts with existing system
environment variables. Using `PULSAR_PREFIX_` is necessary for configuration
keys that aren't available in the shipped configuration files, but are
supported by the component (for example, keys available in Pulsar's
[ServiceConfiguration](https://github.com/apac [...]
+
+For example, setting `-e managedLedgerDefaultEnsembleSize=2` will update the
`managedLedgerDefaultEnsembleSize` property in the target configuration file.
+
+### Configuration methods
+
+#### Method 1: Using `-e` environment variables
+
+Pass configuration properties directly as environment variables in the `docker
run` command:
+
+```bash
+docker run -d \
+ -e metadataStoreUrl=zk:zookeeper:2181 \
+ -e clusterName=cluster-a \
+ -e managedLedgerDefaultEnsembleSize=2 \
+ -e managedLedgerDefaultWriteQuorum=2 \
+ -e managedLedgerDefaultAckQuorum=2 \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+#### Method 2: Using `--env-file` for batch loading
+
+For a large number of configuration properties, use an environment file to
keep your `docker run` command clean:
+
+```bash
+docker run -d --env-file ./broker-config.env \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+Example `broker-config.env` file:
+
+```properties
+metadataStoreUrl=zk:zookeeper:2181
+clusterName=cluster-a
+managedLedgerDefaultEnsembleSize=2
+managedLedgerDefaultWriteQuorum=2
+managedLedgerDefaultAckQuorum=2
+PULSAR_MEM=-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g
+```
+
+#### Method 3: Using Docker Volume to mount custom configuration files
+
+You can mount a custom configuration file from the host into the container,
bypassing the environment variable mechanism entirely:
+
+```bash
+docker run -d \
+ -e PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g" \
+ -v $(pwd)/my-broker.conf:/pulsar/conf/broker.conf \
+ apachepulsar/pulsar-all:latest \
+ bin/pulsar broker
+```
+
+### Common configuration examples
+
+Below are examples of commonly used configuration properties for BookKeeper
and Broker containers. You can add more `-e` flags to the `docker run` command
shown in [Step 3](#step-3-create-and-start-containers) to customize the
behavior.
+
+#### BookKeeper
+
+```bash
+docker run -d -e clusterName=cluster-a \
+ -e zkServers=zookeeper:2181 --net=pulsar \
+ -e metadataServiceUri=metadata-store:zk:zookeeper:2181 \
+ # Storage directories: journal for write-ahead logs, ledgers for actual
message data
+ -e journalDirectory=/pulsar/data/bookkeeper/journal \
+ -e ledgerDirectories=/pulsar/data/bookkeeper/ledgers \
+ # Disk usage thresholds: bookie will reject writes when usage exceeds
these limits
+ -e diskUsageThreshold=0.95 \
+ -e diskUsageWarnThreshold=0.90 \
+ -e diskUsageLwmThreshold=0.87 \
+ # GC and Compaction: reclaim disk space by removing unused ledger data
+ -e gcWaitTime=900000 \
+ -e minorCompactionThreshold=0.2 \
+ -e minorCompactionInterval=3600 \
+ -e majorCompactionThreshold=0.5 \
+ -e majorCompactionInterval=86400 \
+ # JVM memory
+ -e BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g" \
+ # Extra JVM options: appended to JVM flags in the startup script, can
override default JVM parameters
+ -e BOOKIE_EXTRA_OPTS="-XX:+ExitOnOutOfMemoryError" \
+ # Volume mounts: if possible, use separate physical disks for journal and
ledger to improve read/write performance
+ -v $(pwd)/data/bookkeeper/journal:/pulsar/data/bookkeeper/journal \
+ -v $(pwd)/data/bookkeeper/ledgers:/pulsar/data/bookkeeper/ledgers \
+ --name bookie --hostname bookie \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/bookkeeper.conf && exec
bin/pulsar bookie"
+```
+
+#### Broker
+
+```bash
+docker run -d -p 6650:6650 -p 8080:8080 --net=pulsar \
+ -e metadataStoreUrl=zk:zookeeper:2181 \
+ -e zookeeperServers=zookeeper:2181 \
+ -e clusterName=cluster-a \
+ # Ensemble settings: control how messages are replicated across bookies
(must not exceed the number of deployed bookies)
+ -e managedLedgerDefaultEnsembleSize=2 \
+ -e managedLedgerDefaultWriteQuorum=2 \
-e managedLedgerDefaultAckQuorum=1 \
+ # Ports: binary protocol port and HTTP admin port
+ -e brokerServicePort=6650 \
+ -e webServicePort=8080 \
+ # JVM memory
+ -e PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g" \
+ # Extra JVM options: appended to JVM flags in the startup script, can
override default JVM parameters
+ -e PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13
-Dio.netty.allocator.numDirectArenas=8
-Dio.netty.allocator.maxCachedBufferCapacity=1048576" \
--name broker --hostname broker \
apachepulsar/pulsar-all:latest \
bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
```
+
+:::tip
+
+You can also refer to the default configuration in the [Pulsar Helm Chart
values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml)
as a tuning reference.
+
+:::
diff --git a/versioned_docs/version-4.2.x/deploy-bare-metal.md
b/versioned_docs/version-4.2.x/deploy-bare-metal.md
index 95ee9028335..27957a31df6 100644
--- a/versioned_docs/version-4.2.x/deploy-bare-metal.md
+++ b/versioned_docs/version-4.2.x/deploy-bare-metal.md
@@ -121,6 +121,16 @@ Directory | Contains
`lib` | The [JAR](https://en.wikipedia.org/wiki/JAR_(file_format)) files that
Pulsar uses
`logs` | Logs that the installation creates
+The `conf` directory contains configuration files for various Pulsar
components. Below is a brief overview of the main configuration categories:
+
+- **JVM Configuration** (`pulsar_env.sh` / `bkenv.sh`): Controls JVM memory
allocation (`PULSAR_MEM`, `BOOKIE_MEM`), garbage collection options
(`PULSAR_GC`, `BOOKIE_GC`), and extra JVM options (`PULSAR_EXTRA_OPTS`,
`BOOKIE_EXTRA_OPTS`) for Broker, BookKeeper, and other components.
+- **Broker Configuration** (`broker.conf`): Core runtime parameters for the
Pulsar Broker, including metadata store connection, cluster name, ports,
message retention policies, authentication, and authorization settings.
+- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters
for BookKeeper Bookies, including journal and ledger directories, ZooKeeper
connection, compaction, and disk usage thresholds.
+- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings
including log levels, output format, file rolling strategies, and log output
directories.
+- **Dynamic Configuration**: Some Broker configuration properties can be
updated at runtime without restarting the service, using the `pulsar-admin` CLI
tool or the Admin REST API. Dynamic configurations are stored in the metadata
store (ZooKeeper) and take effect across all Brokers in the cluster.
+
+For a complete list of all available configuration properties, see the [Pulsar
Configuration Reference](https://pulsar.apache.org/reference/#/next/).
+
### Install Built-in Connectors (optional)
To use `built-in` connectors, you need to download the connectors tarball
release on every broker node in one of the following ways :
@@ -297,19 +307,116 @@ You can obtain the metadata service URI of the existing
BookKeeper cluster by us
[BookKeeper](https://bookkeeper.apache.org) handles all persistent data
storage in Pulsar. You need to deploy a cluster of BookKeeper bookies to use
Pulsar. You can choose to run a **3-bookie BookKeeper cluster**.
+### Configure BookKeeper
+
+BookKeeper configuration is split across two files:
+
+- **`conf/bookkeeper.conf`**: Contains all BookKeeper runtime parameters,
including metadata store connection, storage directories, compaction settings,
and disk usage thresholds.
+- **`conf/bkenv.sh`**: Contains JVM-related parameters for the Bookie process,
including memory allocation (`BOOKIE_MEM`), garbage collection options
(`BOOKIE_GC`), and extra JVM flags (`BOOKIE_EXTRA_OPTS`).
+
+#### Metadata store connection
+
You can configure BookKeeper bookies using the
[`conf/bookkeeper.conf`](reference-configuration.md#bookkeeper) configuration
file. The most important step in configuring bookies for our purposes here is
ensuring that `metadataServiceUri` is set to the URI for the ZooKeeper cluster.
The following is an example:
```properties
metadataServiceUri=zk://zk1.us-west.example.com:2181;zk2.us-west.example.com:2181;zk3.us-west.example.com:2181/ledgers
```
-Which using `;` as separator in `metadataServiceUri`
+:::note
+
+Use `;` as the separator in `metadataServiceUri`.
:::
-Once you appropriately modify the `metadataServiceUri` parameter, you can make
any other configuration changes that you require. You can find a full listing
of the available BookKeeper configuration parameters
[here](reference-configuration.md#bookkeeper). However, consulting the
[BookKeeper
documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a
more in-depth guide might be a better choice.
+For more information about ZooKeeper and BookKeeper administration, see
[ZooKeeper and BookKeeper
administration](https://pulsar.apache.org/docs/next/administration-zk-bk/).
+
+#### Storage directories
+
+In a production environment, you should configure dedicated disks for journal
and ledger storage. Keeping them on separate disks significantly improves write
performance.
+
+```properties
+# WAL (Write-Ahead Log) directory — use a dedicated SSD for low-latency writes
+journalDirectory=/data/bookkeeper/journal
+
+# Ledger storage directory — use a separate disk from the journal
+ledgerDirectories=/data/bookkeeper/ledgers
+```
+
+- `journalDirectory`: Defaults to `data/bookkeeper/journal`. The journal is a
write-ahead log that records every write before it is applied to the ledger
storage. Using a dedicated high-speed SSD for the journal directory is critical
for write latency.
+- `ledgerDirectories`: Defaults to `data/bookkeeper/ledgers`. This is where
the actual ledger data is stored. Separating it from the journal directory
avoids I/O contention and improves throughput.
+
+#### GC and Compaction
+
+BookKeeper writes entries from multiple ledgers into shared Entry Log files
(default max 1 GB each, controlled by `logSizeLimit`). When ledgers are deleted
— for example, after Pulsar's retention policy trims expired data — the Entry
Log files that contained those ledgers develop unused space. The Bookie's GC
thread periodically scans for deleted ledgers and triggers compaction to
reclaim disk space by rewriting the remaining valid entries into new files.
+
+BookKeeper provides two levels of compaction:
+
+- **Minor Compaction**: Targets Entry Log files where the valid data ratio is
below `minorCompactionThreshold` (default 0.2, i.e., 20%). Runs at
`minorCompactionInterval` (default: every hour). Designed to quickly reclaim
heavily fragmented files.
+- **Major Compaction**: Targets Entry Log files where the valid data ratio is
below `majorCompactionThreshold` (default 0.5, i.e., 50%). Runs at
`majorCompactionInterval` (default: every day). Covers a wider range of files
with moderate fragmentation.
+
+```properties
+# GC scan interval (ms), default: 900000 (15 min)
+gcWaitTime=900000
+
+# Minor Compaction: threshold and interval
+minorCompactionThreshold=0.2
+minorCompactionInterval=3600
+
+# Major Compaction: threshold and interval
+majorCompactionThreshold=0.5
+majorCompactionInterval=86400
+```
+
+:::note
+
+`minorCompactionInterval` and `majorCompactionInterval` must be greater than
`gcWaitTime`, otherwise compaction will not run.
+
+:::
+
+#### Disk usage thresholds
+
+BookKeeper monitors disk usage and can automatically switch a Bookie to
read-only mode to prevent disk exhaustion.
+
+```properties
+# Bookie enters read-only mode when disk usage exceeds this threshold
(default: 0.95)
+diskUsageThreshold=0.95
+
+# Warning threshold — Major Compaction is paused when disk usage exceeds this
value (default: 0.90)
+diskUsageWarnThreshold=0.90
+
+# Low water mark — Bookie returns to read-write mode only after disk usage
drops below this value
+# Set it lower than diskUsageWarnThreshold to avoid frequent mode switching
(recommended: 0.87)
+diskUsageLwmThreshold=0.87
+```
+
+#### JVM configuration (bkenv.sh)
+
+The `conf/bkenv.sh` file controls JVM parameters for the Bookie process:
+
+- `BOOKIE_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g`. Adjust
based on your storage workload. Insufficient heap memory leads to frequent GC,
which increases write and read latency — especially under high throughput, GC
pauses can cause write timeouts. Direct memory is primarily used for Netty
ByteBuf allocation — BookKeeper defaults to the `PooledDirect` memory
allocator, which allocates all ByteBuf from direct memory for network I/O and
internal data handling.
+
+ ```bash
+ # Example: increase heap and direct memory for high-throughput workloads
+ BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g"
+ ```
+
+- `BOOKIE_EXTRA_OPTS`: Passes additional JVM flags to the Bookie process.
Examples:
+
+ ```bash
+ # Enable heap dump on OOM (the default script only enables
ExitOnOutOfMemoryError,
+ # without a heap dump file you cannot diagnose the root cause)
+ BOOKIE_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/logs/bookie/heapdump.hprof"
+
+ # Temporarily enable Netty leak detection for troubleshooting off-heap
memory leaks
+ # (default is disabled; set to advanced level when investigating)
+ BOOKIE_EXTRA_OPTS="-Dio.netty.leakDetection.level=advanced"
+ ```
+
+After you finish editing both `conf/bookkeeper.conf` and `conf/bkenv.sh`, you
can find a full listing of the available BookKeeper configuration parameters
[here](reference-configuration.md#bookkeeper). However, consulting the
[BookKeeper
documentation](https://bookkeeper.apache.org/docs/next/reference/config/) for a
more in-depth guide might be a better choice.
-Once you apply the desired configuration in `conf/bookkeeper.conf`, you can
start up a bookie on each of your BookKeeper hosts. You can start up each
bookie either in the background, using
[nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
+### Start BookKeepers
+
+With the desired configuration applied in `conf/bookkeeper.conf` and
`conf/bkenv.sh`, you can start up a bookie on each of your BookKeeper hosts.
You can start up each bookie either in the background, using
[nohup](https://en.wikipedia.org/wiki/Nohup), or in the foreground.
To start the bookie in the background, use the
[`pulsar-daemon`](reference-cli-tools.md) CLI tool:
@@ -346,6 +453,13 @@ Pulsar brokers are the last thing you need to deploy in
your Pulsar cluster. Bro
### Configure Brokers
+Broker configuration is split across two files:
+
+- **`conf/broker.conf`**: Contains all Broker runtime parameters, including
metadata store connection, cluster name, ports, replication settings, and
feature toggles.
+- **`conf/pulsar_env.sh`**: Contains JVM-related parameters for the Broker
process, including memory allocation (`PULSAR_MEM`), garbage collection options
(`PULSAR_GC`), and extra JVM flags (`PULSAR_EXTRA_OPTS`).
+
+#### Metadata store and cluster settings
+
You can configure brokers using the `conf/broker.conf` configuration file. The
most important element of broker configuration is ensuring that each broker is
aware of the ZooKeeper cluster that you have deployed. Ensure that the
[`metadataStoreUrl`](reference-configuration.md#broker) and
[`configurationMetadataStoreUrl`](reference-configuration.md#broker) parameters
are correct. In this case, since you only have 1 cluster and no configuration
store setup, the `configurationMetadataStoreU [...]
```properties
@@ -368,18 +482,57 @@ webServicePort=8080
webServicePortTls=8443
```
-> If you deploy Pulsar in a one-node cluster, you should update the
replication settings in `conf/broker.conf` to `1`.
->
-> ```properties
-> # Number of bookies to use when creating a ledger
-> managedLedgerDefaultEnsembleSize=1
->
-> # Number of copies to store for each message
-> managedLedgerDefaultWriteQuorum=1
->
-> # Number of guaranteed copies (acks to wait before write is complete)
-> managedLedgerDefaultAckQuorum=1
-> ```
+#### Managed ledger settings
+
+These parameters control how the Broker creates BookKeeper ledgers for message
storage. They map to the BookKeeper protocol's [Ensemble / Write Quorum / Ack
Quorum](https://bookkeeper.apache.org/docs/getting-started/concepts/#ledgers)
model:
+
+```properties
+# Ensemble size (E): number of bookies to use when creating a ledger (default:
2)
+managedLedgerDefaultEnsembleSize=2
+
+# Write quorum (Qw): number of copies to store for each entry (default: 2)
+managedLedgerDefaultWriteQuorum=2
+
+# Ack quorum (Qa): number of acks to wait before a write is considered
complete (default: 2)
+managedLedgerDefaultAckQuorum=2
+```
+
+The invariant **E ≥ Qw ≥ Qa** must hold; otherwise ledger creation will fail.
+
+> If you deploy Pulsar in a one-node cluster, you should set all three values
to `1`.
+
+#### JVM configuration (pulsar_env.sh)
+
+The `conf/pulsar_env.sh` file controls JVM parameters for the Broker process:
+
+- `PULSAR_MEM`: Defaults to `-Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g`. Adjust
based on your machine's available memory. Insufficient heap memory leads to
frequent GC, and GC pauses increase message publish and consume latency — in
severe cases, Full GC can make the Broker temporarily unavailable. Direct
memory is critical for the Broker's message caching and Netty I/O operations.
+
+ ```bash
+ # Example: increase heap and direct memory for production workloads
+ PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g"
+ ```
+
+- `PULSAR_EXTRA_OPTS`: Passes additional JVM flags to the
Broker/Proxy/ZooKeeper process. Since `PULSAR_EXTRA_OPTS` is appended after
other JVM options on the command line, it can also be used to **override**
existing JVM parameters defined in `pulsar_env.sh` or the `bin/pulsar` startup
script (later flags take precedence). Examples:
+
+ ```bash
+ # Enable heap dump on OOM (the default script only enables
ExitOnOutOfMemoryError,
+ # without a heap dump file you cannot diagnose the root cause after the
process exits)
+ PULSAR_EXTRA_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/data/logs/pulsar/heapdump.hprof"
+
+ # Enable IPv6 support (the default script sets
-Djava.net.preferIPv4Stack=true;
+ # override this if your deployment uses IPv6 networking)
+ PULSAR_EXTRA_OPTS="-Djava.net.preferIPv4Stack=false"
+
+ # Tune Netty memory pool parameters (increase maxOrder and
maxCachedBufferCapacity
+ # if your messages are large, to avoid Netty bypassing the memory pool for
allocation)
+ PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13
-Dio.netty.allocator.numDirectArenas=8
-Dio.netty.allocator.maxCachedBufferCapacity=1048576"
+ ```
+
+:::tip
+
+You can also refer to the default configuration in the [Pulsar Helm Chart
values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml)
as a tuning reference.
+
+:::
### Enable Pulsar Functions (optional)
diff --git a/versioned_docs/version-4.2.x/deploy-docker.md
b/versioned_docs/version-4.2.x/deploy-docker.md
index 9e82f616d80..f03a2732c1f 100644
--- a/versioned_docs/version-4.2.x/deploy-docker.md
+++ b/versioned_docs/version-4.2.x/deploy-docker.md
@@ -34,9 +34,7 @@ Create a ZooKeeper container and start the ZooKeeper service.
```bash
docker run -d -p 2181:2181 --net=pulsar \
-e metadataStoreUrl=zk:zookeeper:2181 \
- -e cluster-name=cluster-a -e managedLedgerDefaultEnsembleSize=1 \
- -e managedLedgerDefaultWriteQuorum=1 \
- -e managedLedgerDefaultAckQuorum=1 \
+ -e cluster-name=cluster-a \
-v $(pwd)/data/zookeeper:/pulsar/data/zookeeper \
--name zookeeper --hostname zookeeper \
apachepulsar/pulsar-all:latest \
@@ -81,10 +79,142 @@ docker run -d -p 6650:6650 -p 8080:8080 --net=pulsar \
-e metadataStoreUrl=zk:zookeeper:2181 \
-e zookeeperServers=zookeeper:2181 \
-e clusterName=cluster-a \
- -e managedLedgerDefaultEnsembleSize=1 \
- -e managedLedgerDefaultWriteQuorum=1 \
+ --name broker --hostname broker \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+## Step 4: Configuration overview
+
+Pulsar Docker images support the following configuration categories:
+
+- **JVM Configuration**: Controls JVM memory allocation and garbage collection
for Broker and BookKeeper processes. In Docker, JVM parameters are set via
environment variables such as `PULSAR_MEM` and `BOOKIE_MEM`.
+- **Broker Configuration** (`broker.conf`): Core runtime parameters for the
Pulsar Broker, including metadata store connection, cluster name, ports, and
message replication settings.
+- **BookKeeper Configuration** (`bookkeeper.conf`): Storage engine parameters
for BookKeeper Bookies, including journal and ledger directories, compaction,
and disk usage thresholds.
+- **Log4j Configuration** (`log4j2.yaml`): Logging framework settings
including log levels, output format, and file rolling strategies.
+- **Dynamic Configuration**: Some Broker configuration properties can be
updated at runtime without restarting the container, using the `pulsar-admin`
CLI tool or the Admin REST API.
+
+For a complete list of all available configuration properties, see the [Pulsar
Configuration Reference](https://pulsar.apache.org/reference/#/next/).
+
+### How Docker configuration works
+
+Pulsar Docker images include a Python script called `apply-config-from-env.py`
that runs before the main process starts. This script reads all environment
variables and maps them directly to configuration file properties:
+
+1. If an environment variable name matches a key in the built-in configuration
file shipped with the container (e.g., `broker.conf` or `bookkeeper.conf`), the
script updates that key's value.
+2. Environment variables prefixed with `PULSAR_PREFIX_` are also supported —
the prefix is stripped and the remaining name is used as the configuration key.
This is useful when the configuration key conflicts with existing system
environment variables. Using `PULSAR_PREFIX_` is necessary for configuration
keys that aren't available in the shipped configuration files, but are
supported by the component (for example, keys available in Pulsar's
[ServiceConfiguration](https://github.com/apac [...]
+
+For example, setting `-e managedLedgerDefaultEnsembleSize=2` will update the
`managedLedgerDefaultEnsembleSize` property in the target configuration file.
+
+### Configuration methods
+
+#### Method 1: Using `-e` environment variables
+
+Pass configuration properties directly as environment variables in the `docker
run` command:
+
+```bash
+docker run -d \
+ -e metadataStoreUrl=zk:zookeeper:2181 \
+ -e clusterName=cluster-a \
+ -e managedLedgerDefaultEnsembleSize=2 \
+ -e managedLedgerDefaultWriteQuorum=2 \
+ -e managedLedgerDefaultAckQuorum=2 \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+#### Method 2: Using `--env-file` for batch loading
+
+For a large number of configuration properties, use an environment file to
keep your `docker run` command clean:
+
+```bash
+docker run -d --env-file ./broker-config.env \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
+```
+
+Example `broker-config.env` file:
+
+```properties
+metadataStoreUrl=zk:zookeeper:2181
+clusterName=cluster-a
+managedLedgerDefaultEnsembleSize=2
+managedLedgerDefaultWriteQuorum=2
+managedLedgerDefaultAckQuorum=2
+PULSAR_MEM=-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g
+```
+
+#### Method 3: Using Docker Volume to mount custom configuration files
+
+You can mount a custom configuration file from the host into the container,
bypassing the environment variable mechanism entirely:
+
+```bash
+docker run -d \
+ -e PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g" \
+ -v $(pwd)/my-broker.conf:/pulsar/conf/broker.conf \
+ apachepulsar/pulsar-all:latest \
+ bin/pulsar broker
+```
+
+### Common configuration examples
+
+Below are examples of commonly used configuration properties for BookKeeper
and Broker containers. You can add more `-e` flags to the `docker run` command
shown in [Step 3](#step-3-create-and-start-containers) to customize the
behavior.
+
+#### BookKeeper
+
+```bash
+docker run -d -e clusterName=cluster-a \
+ -e zkServers=zookeeper:2181 --net=pulsar \
+ -e metadataServiceUri=metadata-store:zk:zookeeper:2181 \
+ # Storage directories: journal for write-ahead logs, ledgers for actual
message data
+ -e journalDirectory=/pulsar/data/bookkeeper/journal \
+ -e ledgerDirectories=/pulsar/data/bookkeeper/ledgers \
+ # Disk usage thresholds: bookie will reject writes when usage exceeds
these limits
+ -e diskUsageThreshold=0.95 \
+ -e diskUsageWarnThreshold=0.90 \
+ -e diskUsageLwmThreshold=0.87 \
+ # GC and Compaction: reclaim disk space by removing unused ledger data
+ -e gcWaitTime=900000 \
+ -e minorCompactionThreshold=0.2 \
+ -e minorCompactionInterval=3600 \
+ -e majorCompactionThreshold=0.5 \
+ -e majorCompactionInterval=86400 \
+ # JVM memory
+ -e BOOKIE_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=4g" \
+ # Extra JVM options: appended to JVM flags in the startup script, can
override default JVM parameters
+ -e BOOKIE_EXTRA_OPTS="-XX:+ExitOnOutOfMemoryError" \
+ # Volume mounts: if possible, use separate physical disks for journal and
ledger to improve read/write performance
+ -v $(pwd)/data/bookkeeper/journal:/pulsar/data/bookkeeper/journal \
+ -v $(pwd)/data/bookkeeper/ledgers:/pulsar/data/bookkeeper/ledgers \
+ --name bookie --hostname bookie \
+ apachepulsar/pulsar-all:latest \
+ bash -c "bin/apply-config-from-env.py conf/bookkeeper.conf && exec
bin/pulsar bookie"
+```
+
+#### Broker
+
+```bash
+docker run -d -p 6650:6650 -p 8080:8080 --net=pulsar \
+ -e metadataStoreUrl=zk:zookeeper:2181 \
+ -e zookeeperServers=zookeeper:2181 \
+ -e clusterName=cluster-a \
+ # Ensemble settings: control how messages are replicated across bookies
(must not exceed the number of deployed bookies)
+ -e managedLedgerDefaultEnsembleSize=2 \
+ -e managedLedgerDefaultWriteQuorum=2 \
-e managedLedgerDefaultAckQuorum=1 \
+ # Ports: binary protocol port and HTTP admin port
+ -e brokerServicePort=6650 \
+ -e webServicePort=8080 \
+ # JVM memory
+ -e PULSAR_MEM="-Xms4g -Xmx4g -XX:MaxDirectMemorySize=8g" \
+ # Extra JVM options: appended to JVM flags in the startup script, can
override default JVM parameters
+ -e PULSAR_EXTRA_OPTS="-Dio.netty.allocator.maxOrder=13
-Dio.netty.allocator.numDirectArenas=8
-Dio.netty.allocator.maxCachedBufferCapacity=1048576" \
--name broker --hostname broker \
apachepulsar/pulsar-all:latest \
bash -c "bin/apply-config-from-env.py conf/broker.conf && exec bin/pulsar
broker"
```
+
+:::tip
+
+You can also refer to the default configuration in the [Pulsar Helm Chart
values.yaml](https://github.com/apache/pulsar-helm-chart/blob/master/charts/pulsar/values.yaml)
as a tuning reference.
+
+:::