This is an automated email from the ASF dual-hosted git repository.
lhotari pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new 8c275d707e7 PIP-465: Split IO Connectors into Separate Repository
(#25383)
8c275d707e7 is described below
commit 8c275d707e7c995ad8518ee73118e109963b86c3
Author: Matteo Merli <[email protected]>
AuthorDate: Sat Mar 28 05:14:16 2026 -0700
PIP-465: Split IO Connectors into Separate Repository (#25383)
---
pip/pip-465.md | 228 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 228 insertions(+)
diff --git a/pip/pip-465.md b/pip/pip-465.md
new file mode 100644
index 00000000000..8c4062a3e2e
--- /dev/null
+++ b/pip/pip-465.md
@@ -0,0 +1,228 @@
+# PIP-465: Split IO Connectors into Separate Repository
+
+# Background Knowledge
+
+Apache Pulsar ships ~30 IO connectors (Kafka, Kinesis, Cassandra,
Elasticsearch, JDBC, Debezium,
+etc.) as part of its main repository. These connectors are packaged as NAR
files and bundled into
+a `pulsar-all` Docker image alongside the core broker, client, and functions
runtime.
+
+Each connector brings its own dependency tree — often large and conflicting
with other connectors
+or with Pulsar's core dependencies. The connectors interact with Pulsar
exclusively through the
+stable `pulsar-io-core` API, making them natural candidates for independent
development and release.
+
+# Motivation
+
+The primary goal of this PIP is to **make development of Pulsar easier** by
shrinking the core
+codebase. Removing ~30 connectors and their dependency trees from the main
repository will
+massively improve compile time, test execution time, CI resource consumption,
and CI stability.
+
+**Build and CI impact.** Compiling and packaging 30+ connector NARs adds
significant time to
+every CI run and local build, even when a developer is only working on the
broker or client.
+The connectors collectively bring hundreds of transitive dependencies into the
build graph,
+which slows down dependency resolution, inflates vulnerability reports (OWASP
checks must scan
+connector dependencies), and creates version conflicts that require careful
management in the
+main repository's BOM. Removing them dramatically reduces the surface area of
the build.
+
+**Release coupling.** Connectors are tied to the Pulsar release cycle. A bug
fix in a single
+connector (e.g., updating the Elasticsearch client) requires waiting for the
next Pulsar release.
+Conversely, a Pulsar patch release must rebuild all connectors even when none
of them changed.
+The release cadence for connectors will be independent from Pulsar releases,
similar to what
+we already do for client SDKs (Go, Python, Node.js).
+
+**Low integration risk.** The `pulsar-io-core` API that connectors depend on
has been very
+stable for a long time. There have been no breaking changes to the connector
API in years,
+so there is essentially no risk of integration pain from this split.
+
+**Docker image bloat.** The `pulsar-all` image bundles every connector NAR,
weighing in at
+~2.9 GB — a very large image that most deployments don't need. Users typically
deploy only
+1-2 connectors but pay the image pull cost for all of them. The main reason
users chose
+`pulsar-all` over
+`pulsar` was to get the tiered-storage offloaders — this PIP addresses that by
packaging the
+offloader NARs directly into the `pulsar` image. Users who need specific
connectors can still
+build tailored images by adding just the connector NARs they need on top of
`apachepulsar/pulsar`.
+
+**Independent velocity.** Connector maintainers should be able to release new
connector versions
+against a stable Pulsar API without coordinating with the core release train.
+
+# Goals
+
+## In Scope
+
+- **Create `apache/pulsar-connectors` repository** containing all IO connector
modules, with
+ their own Gradle build, version catalog, and CI pipeline. The repository is
forked from the
+ main Pulsar repository to preserve full git history.
+
+- **Remove connector modules from the main Pulsar repository.** Retain only:
+ - `pulsar-io-core` (the connector API)
+ - `pulsar-io-data-generator` (minimal connector used in integration tests)
+ - The functions runtime and worker that load connectors at runtime
+
+- **Remove the `pulsar-all` Docker image.** The image is too large and most
users don't need
+ all connectors in a single image. The `pulsar` image becomes the single
official image.
+ Tiered-storage offloader NARs — the main reason users chose `pulsar-all` —
are included
+ directly in the `pulsar` image.
+
+- **Independent connector releases.** The `pulsar-connectors` repository has
its own versioning
+ and release cadence, independent from Pulsar releases — similar to what we
already do for
+ client SDKs. It can release new connector versions against any compatible
Pulsar release.
+
+- **Connector distribution packaging.** The connectors repository produces a
single release
+ containing all connector NARs, as a distribution tarball that users can
deploy into an
+ existing Pulsar installation.
+
+## Out of Scope
+
+- Changing the connector API (`pulsar-io-core`)
+- Changing how the functions worker discovers and loads connector NARs
+- A connector marketplace or registry (future enhancement)
+- Splitting out tiered-storage offloaders into their own repository
+
+# High Level Design
+
+The split creates two repositories from what is currently one:
+
+```
+apache/pulsar (main repo)
+├── pulsar-io/core/ # Connector API (retained)
+├── pulsar-io/data-generator/ # Test connector (retained)
+├── pulsar-functions/ # Runtime + worker (retained)
+├── docker/pulsar/ # Single Docker image
+└── (broker, client, etc.)
+
+apache/pulsar-connectors (new repo)
+├── aerospike/
+├── aws/
+├── cassandra/
+├── debezium/
+│ ├── core/
+│ ├── mysql/
+│ ├── postgres/
+│ └── ...
+├── elastic-search/
+├── jdbc/
+│ ├── core/
+│ ├── postgres/
+│ └── ...
+├── kafka/
+├── kafka-connect-adaptor/
+├── kinesis/
+├── rabbitmq/
+├── ... (all other connectors)
+├── distribution/io/ # Distribution packaging
+└── docs/ # Connector docs generation
+```
+
+The connectors repository consumes Pulsar artifacts (`pulsar-io-core`,
`pulsar-client`, etc.)
+as external Maven dependencies, not as source dependencies. This ensures
connectors build against
+the published API and don't accidentally depend on internals.
+
+# Detailed Design
+
+## Repository Structure
+
+The new `pulsar-connectors` repository is forked from the main Pulsar
repository to preserve
+git history, then trimmed to contain only connector-related modules.
Connectors are promoted
+from nested `pulsar-io/<name>` paths to top-level `<name>/` directories for a
flatter structure.
+
+## Build Configuration
+
+The connectors repository has its own:
+- `settings.gradle.kts` with all connector modules
+- `gradle/libs.versions.toml` with connector-specific dependency versions
+- `pulsar-dependencies/` platform module pinning Pulsar artifact versions
+- `build.gradle.kts` root build with shared configuration
+
+Pulsar core artifacts are declared as dependencies with a configurable version:
+```kotlin
+implementation("org.apache.pulsar:pulsar-io-core:${pulsarVersion}")
+```
+
+## Versioning Strategy
+
+The initial release of `pulsar-connectors` will use the same version as the
next Pulsar
+release (whether that is 4.3 or 5.0), to make the transition clear. After
that, the
+connectors repository follows its own independent release cadence.
+All connectors are released together as a single release (not individually),
and each
+release specifies which Pulsar versions it is compatible with.
+
+## Docker Image Changes
+
+The `pulsar-all` image is removed. It bundled all connector NARs alongside the
broker,
+producing a very large image that most deployments didn't need. The main
reason users chose
+`pulsar-all` over `pulsar` was to get the tiered-storage offloaders. With this
change:
+
+- Tiered-storage offloader NARs move into the `pulsar` image, eliminating the
primary reason
+ for `pulsar-all` to exist
+- The `pulsar` Docker image becomes the single official image, containing the
broker, functions
+ runtime, and tiered-storage offloader NARs
+- Users who need specific connectors can build tailored images by adding just
the connector
+ NARs they need on top of `apachepulsar/pulsar`, or mount them via volume
mounts
+
+## CI and Testing
+
+- The main Pulsar repository's CI no longer builds or tests connectors
+- The connectors repository has its own CI that builds and tests all connectors
+- Integration tests that exercise specific connectors (e.g., Cassandra sink,
Kafka source)
+ move to the connectors repository
+- The main repository retains integration tests using `data-generator` for
testing the
+ connector loading and runtime machinery
+
+## Migration for Users
+
+Users who currently use `pulsar-all` Docker image:
+1. Switch to the `pulsar` Docker image
+2. Download needed connector NARs from the connectors release
+3. Mount NARs into the container (e.g., via volume mount to
`/pulsar/connectors/`)
+
+Users who build from source:
+1. Build the main Pulsar repository as before (faster, since connectors are
gone)
+2. Build the connectors repository separately if needed
+
+## Public-facing Changes
+
+### Docker Images
+
+| Before | After |
+|--------|-------|
+| `pulsar` — core only | `pulsar` — core + tiered-storage offloaders |
+| `pulsar-all` — core + all connectors + offloaders | *(removed)* |
+
+### Artifacts
+
+- All connector NARs move from the main Pulsar release to a single unified
release from
+ the `pulsar-connectors` repository
+- All other Pulsar artifacts remain unchanged
+
+### Configuration
+
+No changes to broker, client, or functions worker configuration.
+
+# Backward & Forward Compatibility
+
+## Backward Compatibility
+
+The connector API (`pulsar-io-core`) does not change. Existing connector NARs
continue
+to work with the functions worker without modification.
+
+The `pulsar-io-core` API has been very stable for years with no breaking
changes, so connectors
+built against older API versions will continue to work with newer Pulsar
releases and vice versa.
+
+## Forward Compatibility
+
+New connector releases can target older Pulsar versions, as long as the
`pulsar-io-core`
+API they depend on is compatible. Given the long track record of API
stability, this is
+expected to work seamlessly across Pulsar 4.x releases.
+
+# Security Considerations
+
+No security implications. Connectors continue to be loaded through the same
NAR classloader
+isolation mechanism. The split does not change the security model.
+
+Separating connector dependencies from the main repository actually improves
security posture
+by reducing the attack surface of the core Pulsar build and making connector
dependency
+updates independently releasable.
+
+# Links
+
+* Mailing List discussion thread: [link]
+* Mailing List voting thread: [link]