This is an automated email from the ASF dual-hosted git repository.

penghui pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new 78281fd0450 [improve][pip] PIP-464: Deprecate legacy Jackson 
JsonSchema format for SchemaType.JSON (#25361)
78281fd0450 is described below

commit 78281fd04500e1cbd65f2a4818ffd85f42739106
Author: Penghui Li <[email protected]>
AuthorDate: Mon Mar 23 09:09:34 2026 -0700

    [improve][pip] PIP-464: Deprecate legacy Jackson JsonSchema format for 
SchemaType.JSON (#25361)
---
 pip/pip-464.md | 213 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 213 insertions(+)

diff --git a/pip/pip-464.md b/pip/pip-464.md
new file mode 100644
index 00000000000..d8792ebe818
--- /dev/null
+++ b/pip/pip-464.md
@@ -0,0 +1,213 @@
+# PIP-464: Deprecate legacy Jackson JsonSchema format for SchemaType.JSON
+
+## Background knowledge
+
+In Pulsar, `SchemaType.JSON` is used for topics where producers and consumers 
exchange JSON-encoded messages with a defined schema. The schema definition is 
stored in `SchemaInfo.schema` (the `schema_data` field) and is used by the 
broker for validation and compatibility checking, and by consumers for 
deserialization.
+
+There are two relevant schema definition formats:
+
+- **Apache Avro schema format**: 
`{"type":"record","name":"MyRecord","fields":[{"name":"field1","type":"string"}]}`
 — the standard format since Pulsar 2.1. This is what consumers 
(`AvroBaseStructSchema`, `GenericJsonSchema`, `AutoConsumeSchema`) require to 
function correctly.
+
+- **Jackson JSON Schema Draft format**: 
`{"type":"object","properties":{"field1":{"type":"string"}}}` — the legacy 
format from Pulsar 2.0, generated by Jackson's `JsonSchemaGenerator`. This was 
superseded in Pulsar 2.1 ([commit 
1893323bc2](https://github.com/apache/pulsar/commit/1893323bc2), [PR 
#2071](https://github.com/apache/pulsar/pull/2071)) when the project 
standardized on Avro format for all structured schemas.
+
+To maintain backward compatibility with schemas created during the 2.0 era, 
fallback logic was added in several components:
+
+- **`StructSchemaDataValidator`** — the broker-side validator that checks 
whether a schema definition is structurally valid. It first attempts to parse 
the schema as Avro; if that fails, it falls back to Jackson `JsonSchema` 
parsing.
+- **`JsonSchemaCompatibilityCheck`** — the broker-side compatibility checker. 
It has permissive handling for mixed-format scenarios (Avro↔Jackson, 
Jackson↔Jackson).
+- **`ProducerImpl`** — the Java client's producer implementation. It detects 
the broker's protocol version and sends the old Jackson format to brokers below 
protocol version 13.
+
+## Motivation
+
+The broker-side fallback logic is too lenient. When Avro parsing fails, the 
Jackson fallback accepts **any valid JSON** as a schema definition for 
`SchemaType.JSON`, not just the legacy Jackson format. This has caused real 
issues for non-Java clients (e.g., the Rust client) where users accidentally 
register a JSON Schema Draft 2020-12 definition:
+
+1. The broker's `StructSchemaDataValidator` accepts it — Avro parse fails, 
Jackson fallback succeeds because it accepts any JSON.
+2. The broker's compatibility check allows it — empty block for the 
Avro→JsonSchema or JsonSchema→JsonSchema path.
+3. But when a Java consumer uses `AutoConsumeSchema` or `GenericJsonSchema`, 
it fails with `SchemaParseException: Type not supported: object` because 
`AvroBaseStructSchema` strictly requires Avro format — no fallback on the 
consumer side.
+
+The result is that the broker stores a schema that no Java consumer can read. 
The failure is deferred from producer registration time (where it should be 
caught) to consumer read time (where it is confusing and unrecoverable without 
schema deletion).
+
+There is an asymmetry in the system today: the broker side is lenient (accepts 
any JSON), but the consumer side is strict (requires Avro). This PIP resolves 
the asymmetry by making the broker side equally strict.
+
+## Goals
+
+### In Scope
+
+- Add a broker configuration to control whether the legacy Jackson JSON Schema 
format is accepted for `SchemaType.JSON`.
+- Default to strict Avro-only validation, consistent with what the consumer 
side already requires.
+- Tighten `StructSchemaDataValidator` and `JsonSchemaCompatibilityCheck` to 
reject non-Avro schemas when the legacy flag is disabled.
+- Deprecate (but not remove) the `ProducerImpl` client-side code that sends 
old Jackson format to brokers below protocol version 13.
+- Document that `schema_data` for `SchemaType.JSON` must be an Apache Avro 
schema definition.
+
+### Out of Scope
+
+- Removing the `ProducerImpl` client-side backward-compatibility code — this 
will be addressed in a future major release.
+- Migration tooling to detect or convert legacy schemas in existing 
deployments.
+- Changes to schema handling for other `SchemaType` values (AVRO, PROTOBUF, 
etc.).
+
+## High Level Design
+
+A new broker configuration `schemaJsonAllowLegacyJacksonFormat` (default 
`false`) controls whether the old Jackson JSON Schema format is accepted for 
`SchemaType.JSON` schema definitions.
+
+When disabled (default):
+- `StructSchemaDataValidator` requires valid Avro schema format. If Avro 
parsing fails, the schema is rejected immediately with the Avro 
`SchemaParseException` — no Jackson fallback.
+- `JsonSchemaCompatibilityCheck` requires both the existing and new schema to 
be valid Avro format. Mixed-format compatibility is rejected.
+
+When enabled:
+- The current backward-compatible behavior is preserved exactly as it is 
today. No logging or metrics overhead.
+
+On the client side, the existing `ProducerImpl` code that sends old Jackson 
format to pre-v13 brokers is annotated as `@Deprecated` with a reference to 
this PIP.
+
+## Detailed Design
+
+### Design & Implementation Details
+
+#### 1. `StructSchemaDataValidator`
+
+**File:** 
`pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/validator/StructSchemaDataValidator.java`
+
+Current behavior:
+```
+try {
+    parse Avro schema
+} catch (SchemaParseException) {
+    try {
+        parse Jackson JsonSchema  // accepts ANY valid JSON
+    } catch (...) {
+        throw invalid schema
+    }
+}
+```
+
+Proposed behavior:
+```
+try {
+    parse Avro schema
+} catch (SchemaParseException e) {
+    if (schemaJsonAllowLegacyJacksonFormat) {
+        try {
+            parse Jackson JsonSchema
+        } catch (...) {
+            throw invalid schema
+        }
+    } else {
+        throw invalid schema (propagate original Avro SchemaParseException)
+    }
+}
+```
+
+When `schemaJsonAllowLegacyJacksonFormat=false` (default), the Avro 
`SchemaParseException` is propagated directly. This reuses the existing error 
message without modification.
+
+#### 2. `JsonSchemaCompatibilityCheck`
+
+**File:** 
`pulsar-broker/src/main/java/org/apache/pulsar/broker/service/schema/JsonSchemaCompatibilityCheck.java`
+
+Current behavior: The compatibility check has empty/permissive handling for 
mixed-format scenarios (Avro↔Jackson, Jackson↔Jackson).
+
+Proposed behavior: When `schemaJsonAllowLegacyJacksonFormat=false`, all schema 
definitions passed to compatibility checking must be valid Avro format. If 
either the existing or new schema fails Avro parsing, the compatibility check 
returns incompatible. This is consistent since `StructSchemaDataValidator` will 
have already rejected non-Avro schemas at registration time, so this serves as 
a defense-in-depth check.
+
+#### 3. `ProducerImpl` Client-Side Code (Deprecation Only)
+
+**File:** 
`pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java`
+
+The existing code that detects broker protocol version and falls back to 
sending the old Jackson format is annotated with `@Deprecated` and a code 
comment referencing this PIP. No behavioral change to the client in this PIP — 
removal is deferred to a future major release.
+
+#### 4. Configuration Plumbing
+
+The `schemaJsonAllowLegacyJacksonFormat` config value must be accessible from 
both `StructSchemaDataValidator` and `JsonSchemaCompatibilityCheck`. This will 
be threaded through the existing `SchemaRegistryService` → validator/checker 
dependency chain, consistent with how other schema-related broker configs are 
propagated.
+
+### Public-facing Changes
+
+#### Public API
+
+No changes to the public client API, admin API, or REST API.
+
+#### Binary protocol
+
+No changes to the Pulsar binary protocol.
+
+#### Configuration
+
+Add a new broker configuration parameter:
+
+| Property | Type | Default | Description |
+|---|---|---|---|
+| `schemaJsonAllowLegacyJacksonFormat` | boolean | `false` | Whether to allow 
legacy Jackson JsonSchema format for `SchemaType.JSON` schema definitions. When 
`false`, only valid Apache Avro schema format is accepted, consistent with what 
the consumer side requires. When `true`, the pre-2.1 backward-compatible 
behavior is preserved for deployments that still have topics with legacy-format 
schemas. |
+
+```java
+@FieldContext(
+    category = CATEGORY_SCHEMA,
+    doc = "Whether to allow legacy Jackson JsonSchema format for 
SchemaType.JSON schema definitions. "
+        + "When false (default), only valid Apache Avro schema format is 
accepted, consistent with "
+        + "what the consumer side requires. When true, the pre-2.1 
backward-compatible behavior is "
+        + "preserved for deployments that still have topics with legacy-format 
schemas."
+)
+private boolean schemaJsonAllowLegacyJacksonFormat = false;
+```
+
+#### CLI
+
+No CLI changes.
+
+#### Metrics
+
+No new metrics.
+
+## Monitoring
+
+No specific monitoring changes. When 
`schemaJsonAllowLegacyJacksonFormat=false` (default), producers attempting to 
register non-Avro schema definitions will receive an error response at 
registration time. Operators can monitor for increased schema registration 
failures after upgrading if they suspect legacy schemas may exist in their 
deployment.
+
+## Security Considerations
+
+No security implications. This change tightens input validation, which 
marginally improves the broker's input handling by rejecting malformed schema 
definitions earlier.
+
+## Backward & Forward Compatibility
+
+### Upgrade
+
+This is a **breaking change** in default behavior.
+
+- **Schemas registered before Pulsar 2.1 (2018)** that still use the Jackson 
JSON Schema Draft format will be rejected when new schema versions are 
registered against the same topic. Existing stored schemas are not modified or 
deleted.
+- **Java producers** are unaffected. `JSONSchema.of()` has generated Avro 
format since Pulsar 2.1.
+- **Non-Java producers** that were incorrectly registering JSON Schema Draft 
definitions will now receive a clear failure at registration time instead of a 
deferred failure at consumer read time.
+- **Users with genuine legacy schemas** can set 
`schemaJsonAllowLegacyJacksonFormat=true` in `broker.conf` to restore the 
previous behavior.
+
+The legacy Jackson format has been superseded since Pulsar 2.1, released in 
2018. Any active topics with old-format schemas have likely been migrated or 
recreated over the past 7+ years. The Java client has not generated Jackson 
format schemas since 2.1.
+
+### Downgrade / Rollback
+
+Rolling back to a prior Pulsar version will restore the lenient fallback 
behavior. No data migration is needed — the configuration flag is purely a 
runtime behavioral switch.
+
+### Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations
+
+No impact on geo-replication. Schema definitions are replicated as-is between 
clusters. If one cluster has `schemaJsonAllowLegacyJacksonFormat=false` and 
receives a replicated topic with a legacy-format schema, the schema was already 
stored — this PIP only affects new schema registrations, not existing stored 
schemas. However, operators should ensure consistent configuration across 
geo-replicated clusters to avoid asymmetric behavior where a schema is accepted 
on one cluster but rejected [...]
+
+## Alternatives
+
+### Alternative 1: Default to `true` (backward-compatible default)
+
+Rejected. The legacy Jackson format has been superseded since Pulsar 2.1 
(2018). The Java client's `JSONSchema.of()` has generated Avro format for over 
7 years. Defaulting to `true` perpetuates a silent failure mode where non-Java 
clients can register schemas that Java consumers cannot read. The primary value 
of this PIP is fixing the default behavior.
+
+### Alternative 2: Descriptive error messages with format detection
+
+Considered adding logic to detect JSON Schema Draft format (e.g., presence of 
`"$schema"` or `"type":"object"` with `"properties"`) and return a targeted 
error message. Rejected in favor of propagating the existing Avro 
`SchemaParseException` to minimize code change surface. The PIP documentation 
and Pulsar schema documentation updates will serve as the guide for non-Java 
client developers.
+
+### Alternative 3: Two-release deprecation period
+
+Considered defaulting to `true` in the first release and flipping to `false` 
in the next. Rejected because the legacy format is 7+ years old, the Java 
client has not generated it since 2.1, and any active topics with old-format 
schemas have likely been migrated or recreated. An immediate default flip is 
appropriate.
+
+### Alternative 4: Warn logging or metrics when legacy format is accepted
+
+Considered adding WARN-level logging or a counter metric when 
`schemaJsonAllowLegacyJacksonFormat=true` and a legacy schema is encountered. 
Rejected to keep the opt-in path simple and silent — users who enable the flag 
are making a conscious choice.
+
+### Alternative 5: Migration tooling
+
+Considered providing an admin CLI command to scan stored schemas and report 
which topics use the old format. Rejected as out of scope — this can be added 
later if demand materializes. Operators can identify legacy schemas by 
attempting to parse stored schema definitions with an Avro parser.
+
+## General Notes
+
+The documentation for `SchemaType.JSON` should be updated to clearly state 
that `schema_data` must be an Apache Avro schema definition, not a JSON Schema 
Draft definition. This is particularly important for non-Java client 
implementations (Rust, Go, Python, C++, Node.js, .NET) that construct schema 
definitions manually rather than using the Java client's `JSONSchema.of()` 
helper.
+
+## Links
+
+* Mailing List discussion thread:
+* Mailing List voting thread:

Reply via email to