This is an automated email from the ASF dual-hosted git repository.

lhotari pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new 8c275d707e7 PIP-465: Split IO Connectors into Separate Repository 
(#25383)
8c275d707e7 is described below

commit 8c275d707e7c995ad8518ee73118e109963b86c3
Author: Matteo Merli <[email protected]>
AuthorDate: Sat Mar 28 05:14:16 2026 -0700

    PIP-465: Split IO Connectors into Separate Repository (#25383)
---
 pip/pip-465.md | 228 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 228 insertions(+)

diff --git a/pip/pip-465.md b/pip/pip-465.md
new file mode 100644
index 00000000000..8c4062a3e2e
--- /dev/null
+++ b/pip/pip-465.md
@@ -0,0 +1,228 @@
+# PIP-465: Split IO Connectors into Separate Repository
+
+# Background Knowledge
+
+Apache Pulsar ships ~30 IO connectors (Kafka, Kinesis, Cassandra, 
Elasticsearch, JDBC, Debezium,
+etc.) as part of its main repository. These connectors are packaged as NAR 
files and bundled into
+a `pulsar-all` Docker image alongside the core broker, client, and functions 
runtime.
+
+Each connector brings its own dependency tree — often large and conflicting 
with other connectors
+or with Pulsar's core dependencies. The connectors interact with Pulsar 
exclusively through the
+stable `pulsar-io-core` API, making them natural candidates for independent 
development and release.
+
+# Motivation
+
+The primary goal of this PIP is to **make development of Pulsar easier** by 
shrinking the core
+codebase. Removing ~30 connectors and their dependency trees from the main 
repository will
+massively improve compile time, test execution time, CI resource consumption, 
and CI stability.
+
+**Build and CI impact.** Compiling and packaging 30+ connector NARs adds 
significant time to
+every CI run and local build, even when a developer is only working on the 
broker or client.
+The connectors collectively bring hundreds of transitive dependencies into the 
build graph,
+which slows down dependency resolution, inflates vulnerability reports (OWASP 
checks must scan
+connector dependencies), and creates version conflicts that require careful 
management in the
+main repository's BOM. Removing them dramatically reduces the surface area of 
the build.
+
+**Release coupling.** Connectors are tied to the Pulsar release cycle. A bug 
fix in a single
+connector (e.g., updating the Elasticsearch client) requires waiting for the 
next Pulsar release.
+Conversely, a Pulsar patch release must rebuild all connectors even when none 
of them changed.
+The release cadence for connectors will be independent from Pulsar releases, 
similar to what
+we already do for client SDKs (Go, Python, Node.js).
+
+**Low integration risk.** The `pulsar-io-core` API that connectors depend on 
has been very
+stable for a long time. There have been no breaking changes to the connector 
API in years,
+so there is essentially no risk of integration pain from this split.
+
+**Docker image bloat.** The `pulsar-all` image bundles every connector NAR, 
weighing in at
+~2.9 GB — a very large image that most deployments don't need. Users typically 
deploy only
+1-2 connectors but pay the image pull cost for all of them. The main reason 
users chose
+`pulsar-all` over
+`pulsar` was to get the tiered-storage offloaders — this PIP addresses that by 
packaging the
+offloader NARs directly into the `pulsar` image. Users who need specific 
connectors can still
+build tailored images by adding just the connector NARs they need on top of 
`apachepulsar/pulsar`.
+
+**Independent velocity.** Connector maintainers should be able to release new 
connector versions
+against a stable Pulsar API without coordinating with the core release train.
+
+# Goals
+
+## In Scope
+
+- **Create `apache/pulsar-connectors` repository** containing all IO connector 
modules, with
+  their own Gradle build, version catalog, and CI pipeline. The repository is 
forked from the
+  main Pulsar repository to preserve full git history.
+
+- **Remove connector modules from the main Pulsar repository.** Retain only:
+  - `pulsar-io-core` (the connector API)
+  - `pulsar-io-data-generator` (minimal connector used in integration tests)
+  - The functions runtime and worker that load connectors at runtime
+
+- **Remove the `pulsar-all` Docker image.** The image is too large and most 
users don't need
+  all connectors in a single image. The `pulsar` image becomes the single 
official image.
+  Tiered-storage offloader NARs — the main reason users chose `pulsar-all` — 
are included
+  directly in the `pulsar` image.
+
+- **Independent connector releases.** The `pulsar-connectors` repository has 
its own versioning
+  and release cadence, independent from Pulsar releases — similar to what we 
already do for
+  client SDKs. It can release new connector versions against any compatible 
Pulsar release.
+
+- **Connector distribution packaging.** The connectors repository produces a 
single release
+  containing all connector NARs, as a distribution tarball that users can 
deploy into an
+  existing Pulsar installation.
+
+## Out of Scope
+
+- Changing the connector API (`pulsar-io-core`)
+- Changing how the functions worker discovers and loads connector NARs
+- A connector marketplace or registry (future enhancement)
+- Splitting out tiered-storage offloaders into their own repository
+
+# High Level Design
+
+The split creates two repositories from what is currently one:
+
+```
+apache/pulsar (main repo)
+├── pulsar-io/core/          # Connector API (retained)
+├── pulsar-io/data-generator/ # Test connector (retained)
+├── pulsar-functions/        # Runtime + worker (retained)
+├── docker/pulsar/           # Single Docker image
+└── (broker, client, etc.)
+
+apache/pulsar-connectors (new repo)
+├── aerospike/
+├── aws/
+├── cassandra/
+├── debezium/
+│   ├── core/
+│   ├── mysql/
+│   ├── postgres/
+│   └── ...
+├── elastic-search/
+├── jdbc/
+│   ├── core/
+│   ├── postgres/
+│   └── ...
+├── kafka/
+├── kafka-connect-adaptor/
+├── kinesis/
+├── rabbitmq/
+├── ... (all other connectors)
+├── distribution/io/         # Distribution packaging
+└── docs/                    # Connector docs generation
+```
+
+The connectors repository consumes Pulsar artifacts (`pulsar-io-core`, 
`pulsar-client`, etc.)
+as external Maven dependencies, not as source dependencies. This ensures 
connectors build against
+the published API and don't accidentally depend on internals.
+
+# Detailed Design
+
+## Repository Structure
+
+The new `pulsar-connectors` repository is forked from the main Pulsar 
repository to preserve
+git history, then trimmed to contain only connector-related modules. 
Connectors are promoted
+from nested `pulsar-io/<name>` paths to top-level `<name>/` directories for a 
flatter structure.
+
+## Build Configuration
+
+The connectors repository has its own:
+- `settings.gradle.kts` with all connector modules
+- `gradle/libs.versions.toml` with connector-specific dependency versions
+- `pulsar-dependencies/` platform module pinning Pulsar artifact versions
+- `build.gradle.kts` root build with shared configuration
+
+Pulsar core artifacts are declared as dependencies with a configurable version:
+```kotlin
+implementation("org.apache.pulsar:pulsar-io-core:${pulsarVersion}")
+```
+
+## Versioning Strategy
+
+The initial release of `pulsar-connectors` will use the same version as the 
next Pulsar
+release (whether that is 4.3 or 5.0), to make the transition clear. After 
that, the
+connectors repository follows its own independent release cadence.
+All connectors are released together as a single release (not individually), 
and each
+release specifies which Pulsar versions it is compatible with.
+
+## Docker Image Changes
+
+The `pulsar-all` image is removed. It bundled all connector NARs alongside the 
broker,
+producing a very large image that most deployments didn't need. The main 
reason users chose
+`pulsar-all` over `pulsar` was to get the tiered-storage offloaders. With this 
change:
+
+- Tiered-storage offloader NARs move into the `pulsar` image, eliminating the 
primary reason
+  for `pulsar-all` to exist
+- The `pulsar` Docker image becomes the single official image, containing the 
broker, functions
+  runtime, and tiered-storage offloader NARs
+- Users who need specific connectors can build tailored images by adding just 
the connector
+  NARs they need on top of `apachepulsar/pulsar`, or mount them via volume 
mounts
+
+## CI and Testing
+
+- The main Pulsar repository's CI no longer builds or tests connectors
+- The connectors repository has its own CI that builds and tests all connectors
+- Integration tests that exercise specific connectors (e.g., Cassandra sink, 
Kafka source)
+  move to the connectors repository
+- The main repository retains integration tests using `data-generator` for 
testing the
+  connector loading and runtime machinery
+
+## Migration for Users
+
+Users who currently use `pulsar-all` Docker image:
+1. Switch to the `pulsar` Docker image
+2. Download needed connector NARs from the connectors release
+3. Mount NARs into the container (e.g., via volume mount to 
`/pulsar/connectors/`)
+
+Users who build from source:
+1. Build the main Pulsar repository as before (faster, since connectors are 
gone)
+2. Build the connectors repository separately if needed
+
+## Public-facing Changes
+
+### Docker Images
+
+| Before | After |
+|--------|-------|
+| `pulsar` — core only | `pulsar` — core + tiered-storage offloaders |
+| `pulsar-all` — core + all connectors + offloaders | *(removed)* |
+
+### Artifacts
+
+- All connector NARs move from the main Pulsar release to a single unified 
release from
+  the `pulsar-connectors` repository
+- All other Pulsar artifacts remain unchanged
+
+### Configuration
+
+No changes to broker, client, or functions worker configuration.
+
+# Backward & Forward Compatibility
+
+## Backward Compatibility
+
+The connector API (`pulsar-io-core`) does not change. Existing connector NARs 
continue
+to work with the functions worker without modification.
+
+The `pulsar-io-core` API has been very stable for years with no breaking 
changes, so connectors
+built against older API versions will continue to work with newer Pulsar 
releases and vice versa.
+
+## Forward Compatibility
+
+New connector releases can target older Pulsar versions, as long as the 
`pulsar-io-core`
+API they depend on is compatible. Given the long track record of API 
stability, this is
+expected to work seamlessly across Pulsar 4.x releases.
+
+# Security Considerations
+
+No security implications. Connectors continue to be loaded through the same 
NAR classloader
+isolation mechanism. The split does not change the security model.
+
+Separating connector dependencies from the main repository actually improves 
security posture
+by reducing the attack surface of the core Pulsar build and making connector 
dependency
+updates independently releasable.
+
+# Links
+
+* Mailing List discussion thread: [link]
+* Mailing List voting thread: [link]

Reply via email to