https://github.com/apache/pulsar/pull/25462


# Background knowledge

[PIP-89](https://github.com/apache/pulsar/blob/master/pip/pip-89.md)
introduced the concept of structured event
logging for Pulsar and was approved in 2021. It proposed replacing SLF4J
string-interpolated logging with structured
key-value logging, using a custom wrapper (`structured-event-log` module)
built on top of SLF4J's MDC.

Since then, a standalone library called [slog](
https://github.com/merlimat/slog) has been developed that provides a
more complete and performant implementation of structured logging. slog
builds on the same principles as PIP-89 but
with a significantly improved API, better performance characteristics, and
support for all standard log levels.

### Structured logging

Traditional logging uses string interpolation to embed values into log
messages:

```java
log.info("[{}] Opening ledger {} with {} entries", name, ledgerId, entries);
```

Structured logging instead attaches context as typed key-value attributes:

```java
log.info().attr("ledgerId", ledgerId).attr("entries", entries).log("Opening
ledger");
```

This makes log records machine-parseable, filterable, and queryable without
complex regex patterns.

### slog API

slog provides a `Logger` interface with the following key features:

* **All standard levels**: `trace()`, `debug()`, `info()`, `warn()`,
`error()` — each with four variants:
  simple message, printf-formatted, fluent builder, and
lambda/consumer-based.
* **Fluent Event builder**: Chain `.attr(key, value)` calls to attach
structured attributes, with primitive overloads
  (`long`, `int`, `double`, `float`, `boolean`) to avoid autoboxing.
* **Derived loggers with context**: `logger.with().attr("topic",
topic).build()` creates a child logger where
  context attributes are automatically included in every log call.
* **Context composition**: `logger.with().ctx(otherLogger).build()`
inherits context from another logger, enabling
  cross-component context propagation without coupling.
* **Zero overhead when disabled**: Level guards are checked before
constructing any objects. The fluent builders
  return a no-op singleton `Event` when the level is disabled, making all
chained calls effectively free.
* **Timed events**: `log.info().timed()` automatically records elapsed
duration between the call and `.log()`.
* **Lazy evaluation**: `log.debug(e -> e.attr(...).log(...))` — the lambda
only executes when the level is enabled.
* **Exception support**: `.exception(t)` for full stack traces,
`.exceptionMessage(t)` for message-only.
* **Lombok integration**: `@CustomLog` annotation generates the static
logger field automatically.

slog auto-discovers the backend at runtime: Log4j2 (preferred) or SLF4J
1.x/2.x as fallback.

# Motivation

While PIP-89 was approved and the `structured-event-log` module was added
to the Pulsar codebase, adoption has been
very limited. Almost all Pulsar components still use traditional SLF4J
logging with string interpolation. The
`structured-event-log` module's API is verbose and lacks support for
standard log levels like `debug` and `trace`,
which has hindered adoption.

Meanwhile, the slog library has evolved to address these shortcomings with
a cleaner, more complete API. This PIP
proposes to adopt slog as the standard logging approach across all Pulsar
components, replacing both direct SLF4J
usage and the internal `structured-event-log` module.

### Problems with the current state

**Inconsistent log formatting**: Every developer formats log messages
differently. The same information appears in
different positions, with different delimiters, and different key naming:
```java
// Various inconsistent patterns found across the codebase:
log.info("[{}] Opening managed ledger {}", name, ledgerId);
log.info("[{}] [{}] Cursor {} recovered to position {}", ledgerName,
cursorName, cursor, position);
log.warn("Error opening ledger {} for cursor {}: {}", ledgerId, cursorName,
errorMsg);
```

**Repeated boilerplate context**: The managed ledger name, cursor name,
topic name, and other context identifiers
are repeated as positional arguments in nearly every log statement — often
as the first one or two `{}` placeholders.
This is verbose, error-prone, and frequently inconsistent.

**isDebugEnabled guards**: Debug-level logging requires explicit level
guards to avoid the cost of string
construction and argument evaluation:
```java
if (log.isDebugEnabled()) {
    log.debug("[{}] Reading entry ledger {}: {}", name,
position.getLedgerId(), position.getEntryId());
}
```
This adds significant boilerplate — in `ManagedLedgerImpl` alone, there
were 58 such guards.

**Poor queryability**: Log analysis tools (Splunk, Elasticsearch, Loki)
cannot reliably extract values embedded in
interpolated strings. Regex-based parsing is fragile and varies by message
format.

**Missing context in error scenarios**: When debugging production issues,
engineers frequently find that critical
context was omitted from log messages because adding another positional
argument was cumbersome.

### Why slog over `structured-event-log`

* **Complete level support**: slog supports all five standard levels
(`trace`, `debug`, `info`, `warn`, `error`),
  while `structured-event-log` only supports `info`, `warn`, and `error`.
* **Simpler API**: slog's `log.info().attr("k", v).log("msg")` is more
natural than
  `log.newRootEvent().resource("k", v).log("msg")`.
* **Better performance**: slog uses a cached generation-counter scheme for
level checks with a single integer
  comparison and no volatile fence. Primitive `attr()` overloads avoid
autoboxing.
* **Derived loggers**: `logger.with().attr(...)` creates an immutable child
logger with persistent context — cleaner
  than creating `EventResources` objects.
* **Lazy evaluation**: Lambda-based logging (`log.debug(e -> ...)`)
eliminates `isDebugEnabled` guards entirely.
* **Lombok integration**: `@CustomLog` works with slog, reducing logger
declaration boilerplate.
* **Standalone library**: slog is maintained as an independent library,
decoupled from Pulsar's release cycle.
  Being a separate library makes it easier to adopt across related projects
(e.g., BookKeeper, Oxia Java client
  which already uses slog), enabling consistent structured logging and
cross-project context propagation.

# Goals

## In Scope

* **Adopt slog** as the standard structured logging library for all Pulsar
components.
* **Remove the internal `structured-event-log` module** from the Pulsar
codebase, since slog supersedes it.
* **Migrate all components** from SLF4J to slog, one module at a time.
Components can be migrated independently
  and there is no concern with a coexistence period where some components
use SLF4J and others use slog.
* **Establish logging conventions**: standardized attribute names
(`managedLedger`, `cursor`, `ledgerId`, `topic`,
  `position`, etc.) to ensure consistency across the codebase.
* **Use derived loggers for context propagation**: Classes that carry
identity (e.g., a managed ledger's name, a
  cursor's name, a topic) should create derived loggers that automatically
attach this context to every log call.

* **Cross-boundary context propagation with BookKeeper**: Since Pulsar
embeds BookKeeper, slog's context
  composition (`ctx()`) enables sharing context across the Pulsar →
BookKeeper boundary. For example, a BookKeeper
  write failure event log would automatically carry the Pulsar topic name
as an attribute, without BookKeeper needing
  to know about Pulsar concepts. This dramatically improves debuggability
of storage-layer issues.

## Out of Scope

* Defining centralized event enums (as discussed in PIP-89). slog uses
string messages, and adopting an enum-based
  log event model would be highly impractical given the scale of the
codebase.
* Modifying the log output format or Log4j2 configuration. slog works
transparently with existing logging backends
  and configurations.
* Introducing distributed tracing or trace ID propagation. This is
orthogonal and can be layered on top later.

### Pulsar client libraries

The `pulsar-client` module is a special case because its logs are emitted
within users' applications rather than
on the server side. slog auto-discovers the logging backend at runtime
(Log4j2 or SLF4J), so it will integrate
transparently with whatever logging framework the user's application
already has configured. No action is required
by users — the client will simply emit structured log records through the
existing backend, and users who configure
a JSON appender will automatically benefit from the structured attributes.

# High Level Design

The migration follows a straightforward, incremental approach:

1. **Add slog as a dependency** in the Pulsar build system (Gradle version
catalog).
2. **Configure Lombok** with `@CustomLog` support for slog.
3. **Migrate modules one at a time**, starting with `managed-ledger` as the
first module (already done as a
   proof of concept). Each module migration involves:
   - Replacing SLF4J imports and logger declarations with slog.
   - Converting log statements from parameterized format to structured
format.
   - Introducing derived loggers for repeated context (e.g., ledger name,
cursor name).
   - Removing `isDebugEnabled()` / `isTraceEnabled()` guards.
4. **Delete the `structured-event-log` module** — it has no existing usages
and can be removed immediately.

Since slog auto-discovers the logging backend (Log4j2 or SLF4J), no changes
to logging configuration files are
needed. The structured attributes are emitted via the backend's native
mechanism (Log4j2 key-value pairs or SLF4J
MDC), so existing log appenders and formats continue to work. Users who
want to leverage structured attributes in
their log output can configure a JSON appender.

# Detailed Design

## Design & Implementation Details

### Logger declaration

For simple classes, use Lombok's `@CustomLog` annotation:

```java
@CustomLog
public class OpAddEntry implements AddCallback {
    // Lombok generates: private static final Logger log =
Logger.get(OpAddEntry.class);

    public void initiate() {
        log.info().attr("ledgerId", ledger.getId()).log("Initiating add
entry");
    }
}
```

For classes with persistent identity context (e.g., a managed ledger, a
cursor, a topic handler), create a
derived instance logger:

```java
public class ManagedLedgerImpl implements ManagedLedger {
    private static final Logger slog = Logger.get(ManagedLedgerImpl.class);
    protected final Logger log;

    public ManagedLedgerImpl(..., String name, ...) {
        this.log = slog.with().attr("managedLedger", name).build();
    }

    // All subsequent log calls automatically include managedLedger=<name>
    public void openLedger(long ledgerId) {
        log.info().attr("ledgerId", ledgerId).log("Opening ledger");
        // Emits: msg="Opening ledger" managedLedger=my-topic ledgerId=12345
    }
}
```

### Context composition across components

Classes that operate on behalf of another component can inherit its context:

```java
public class RangeEntryCacheImpl implements EntryCache {
    private static final Logger slog =
Logger.get(RangeEntryCacheImpl.class);
    private final Logger log;

    RangeEntryCacheImpl(ManagedLedgerImpl ml, ...) {
        this.log = slog.with().ctx(ml.getLogger()).build();
        // Logger name is RangeEntryCacheImpl, but carries managedLedger
context from ml
    }
}
```

### Cross-boundary context: Pulsar → BookKeeper

One of the most powerful benefits of slog's context model is the ability to
propagate context across component
boundaries. Since Pulsar embeds BookKeeper, a BookKeeper ledger operation
can inherit context from the Pulsar
component that initiated it:

```java
// In Pulsar's ManagedLedgerImpl (already has managedLedger context)
public void createLedger() {
    // Pass the Pulsar logger context to BookKeeper
    bookKeeper.asyncCreateLedger(..., log, ...);
}

// In BookKeeper's LedgerHandle
public class LedgerHandle {
    private final Logger log;

    LedgerHandle(Logger pulsarLogger, ...) {
        // BookKeeper's logger inherits Pulsar's context
        this.log = slog.with().ctx(pulsarLogger).attr("ledgerId",
ledgerId).build();
    }

    void handleWriteError(int rc) {
        // This log entry automatically includes managedLedger=my-topic
from Pulsar's context
        log.error().attr("rc", BKException.getMessage(rc)).log("Write
failed");
        // Emits: msg="Write failed" managedLedger=my-topic ledgerId=12345
rc=NotEnoughBookies
    }
}
```

This eliminates the long-standing problem of BookKeeper errors being logged
without any Pulsar-level context,
making it very difficult to correlate storage failures with the affected
topics.

### Conversion patterns

| SLF4J pattern | slog equivalent |
|---|---|
| `log.info("msg")` | `log.info("msg")` |
| `log.info("[{}] msg {}", name, val)` | `log.info().attr("key",
val).log("msg")` |
| `log.error("msg", exception)` | `log.error().exception(e).log("msg")` |
| `log.warn("msg: {}", e.getMessage())` |
`log.warn().exceptionMessage(e).log("msg")` |
| `if (log.isDebugEnabled()) { log.debug(...) }` |
`log.debug().attr(...).log(...)` |
| Manual timing with `System.nanoTime()` | `log.info().timed() ...
event.log("done")` |

### Attribute naming conventions

To ensure consistency across the codebase, the following attribute names
should be used:

| Attribute | Description | Example value |
|---|---|---|
| `managedLedger` | Managed ledger name | `my-tenant/my-ns/my-topic` |
| `cursor` | Cursor name | `my-subscription` |
| `topic` | Topic name | `persistent://tenant/ns/topic` |
| `ledgerId` | BookKeeper ledger ID | `12345` |
| `entryId` | Entry ID within a ledger | `42` |
| `position` | Ledger position (ledgerId:entryId) | `12345:42` |
| `readPosition` | Current read position | `12345:42` |
| `markDeletePosition` | Mark-delete position | `12345:41` |
| `state` | Component state | `Open`, `Closed` |
| `rc` | BookKeeper return code | `OK`, `NoSuchLedger` |
| `subscription` | Subscription name | `my-sub` |
| `consumer` | Consumer name | `consumer-0` |
| `producer` | Producer name | `producer-0` |

## Public-facing Changes

### Configuration

A new Lombok configuration is added to `lombok.config`:

```properties
lombok.log.custom.declaration = io.github.merlimat.slog.Logger
io.github.merlimat.slog.Logger.get(TYPE)
```

### Log output

The log message format changes. With traditional SLF4J:
```
[ManagedLedgerImpl] [my-topic] Opening ledger 12345 with 100 entries
```

With slog (pattern appender):
```
[ManagedLedgerImpl] Opening ledger {managedLedger=my-topic, ledgerId=12345,
entries=100}
```

With slog (Log4j2 JSON appender):
```json
{
  "instant": {"epochSecond": 1775174210, "nanoOfSecond": 277000000},
  "thread": "pulsar-ordered-executor-0-0",
  "level": "INFO",
  "loggerName": "org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl",
  "message": "Opening ledger",
  "contextMap": {
    "managedLedger": "my-topic",
    "ledgerId": "12345",
    "entries": "100"
  },
  "endOfBatch": false,
  "loggerFqcn": "io.github.merlimat.slog.impl.EventImpl",
  "threadId": 42,
  "threadPriority": 5
}
```

Users who rely on parsing specific log message patterns will need to update
their parsers. However, the structured
format makes parsing significantly easier and more reliable.

# Monitoring

No changes to monitoring. Structured logging improves the ability to create
log-based alerts and dashboards by
making attribute-based filtering possible (e.g., alert on all `ERROR` logs
where `managedLedger=X`).

# Security Considerations

No security implications. This is an internal change to how log messages
are formatted. No new endpoints, commands,
or permissions are introduced.

# Backward & Forward Compatibility

## Upgrade

* Log message formats will change as modules are migrated. Users who parse
log messages with regex patterns will
  need to update their parsers. The structured format is strictly more
parseable than the previous format.
* To get the most out of structured logging, users should consider
switching to a JSON log appender if they haven't
  already. However, this is optional — traditional pattern-based appenders
continue to work.
* No configuration changes are required for the upgrade.

## Downgrade / Rollback

No special steps required. Reverting to a previous version restores the
original SLF4J logging.

## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations

No impact on geo-replication. Logging is a local concern and does not
affect inter-cluster communication.

# Alternatives

### Continue using `structured-event-log`

The existing internal module could be extended. However, it lacks
debug/trace levels, has a more verbose API, and
maintaining a custom logging library inside Pulsar adds unnecessary burden.
slog is a superset of the
`structured-event-log` API with better performance and more features.

### Use SLF4J 2.0 Fluent API

SLF4J 2.0 introduced a fluent logging API (`log.atInfo().addKeyValue("k",
v).log("msg")`). However:
* It does not support derived loggers with persistent context.
* It does not support context composition across components.
* No support for timed events or sampling.
* Attribute values are stored as strings only (no primitive overloads).
* Level guards still allocate objects in some implementations.

### Use Log4j2 directly

Log4j2 does not expose a public structured logging API for applications.
While it is possible to pass structured
key-value data through low-level, non-public Log4j2 interfaces (which is
what slog does internally as its
backend), this is not something easily or reliably usable directly from
application code. It would also tightly
couple Pulsar to Log4j2 as the only backend, preventing users from using
other SLF4J-compatible backends.
slog abstracts over these internals and supports both Log4j2 and SLF4J
backends transparently.

### Use Flogger

[Flogger](https://github.com/google/flogger) is Google's fluent logging API
for Java. While it offers a fluent API
and some structured logging features, it has significant performance
drawbacks:
* Extremely slow on the enabled path (~0.7 ops/μs vs slog's 13–25 ops/μs —
roughly **18–35× slower**).
* Very high allocation rate (1624–1904 B/op vs slog's 0–80 B/op — **20–24×
more allocations**).
* No support for derived loggers with persistent context or context
composition.
* No support for timed events.
* Tightly coupled to Google's internal infrastructure; limited backend
flexibility.

### Performance comparison

Benchmarks from the [slog repository](https://github.com/merlimat/slog)
(JMH, higher ops/μs is better, lower B/op
is better):

**Disabled path** (log level is off — measures the cost of checking the
level):

| Library              | ops/μs  |
|----------------------|---------|
| slog                 | 1005.4  |
| Log4j2               | 413.1   |
| SLF4J                | 367.5   |
| Flogger              | 360.3   |

slog's disabled path is **2.4× faster** than Log4j2 and **2.8× faster**
than SLF4J/Flogger. This matters because
the vast majority of debug/trace log statements are disabled in production.

**Enabled path** (log level is on — measures the cost of formatting and
emitting a log message with 2 key-value
attributes):

| Library              | ops/μs |
|----------------------|--------|
| slog (simple)        | 24.6   |
| Log4j2 (simple)      | 14.6   |
| slog (fluent)        | 13.2   |
| slog (fluent+ctx)    | 12.0   |
| SLF4J (simple)       | 11.2   |
| SLF4J (fluent)       | 4.2    |
| Flogger              | 0.7    |

slog is the fastest in both simple and fluent modes. Flogger is **18–35×
slower** than slog.

**Allocation rate** (bytes allocated per log operation):

| Library              | B/op |
|----------------------|------|
| slog (simple)        | 0    |
| Log4j2 (simple)      | 24   |
| SLF4J (simple)       | 24   |
| slog (fluent+ctx)    | 40   |
| slog (fluent)        | 80   |
| SLF4J (fluent)       | 1104 |
| Flogger              | 1624 |

slog's fluent API allocates **6× less** than Log4j2's fluent API, **14×
less** than SLF4J's fluent API, and
**20× less** than Flogger. Since structured logging requires the fluent
API, this is the most relevant comparison
for this PIP's use case.

# General Notes

The `managed-ledger` module has already been fully migrated as a proof of
concept, converting all source and test files
(54 files, ~400 log statements). The migration reduced total lines of code
(1481 additions vs 1392 deletions) while
making every log statement more informative and consistent.

# Links

* Original PIP-89:
https://github.com/apache/pulsar/blob/master/pip/pip-89.md
* slog library: https://github.com/merlimat/slog
* PoC: managed-ledger migration: https://github.com/merlimat/pulsar/pull/20


--
Matteo Merli
<[email protected]>

Reply via email to