Praveenkumar76 opened a new pull request, #1137:
URL: https://github.com/apache/pulsar-site/pull/1137

   Fixes apache/pulsar#23662
   
   ### Motivation
   The documented Debezium PostgreSQL CDC example currently relies on outdated 
Debezium 1.x property names (such as `database.server.name`, 
`schema.whitelist`, and `table.whitelist`). When users copy and paste these 
into modern Pulsar environments using Debezium 2.x, it leads to validation 
failures. 
   
   Additionally, the documentation does not specify the correct converters. 
Without explicitly defining string converters, users experience silent data 
drops caused by `KeyValue` schema mismatches, preventing messages from being 
published to the Pulsar topic. 
   
   ### Modifications
   Updated the PostgreSQL Debezium source configuration examples (both JSON and 
YAML) in the documentation to reflect Debezium 2.x standards:
   * Replaced `database.server.name` with `topic.prefix`.
   * Replaced `schema.whitelist` with `schema.include.list`.
   * Replaced `table.whitelist` with `table.include.list`.
   * Added `key.converter` and `value.converter` set to 
`org.apache.kafka.connect.storage.StringConverter` to prevent silent data drops 
due to schema mismatches.
   * (If applicable) Updated the `localrun` CLI command example to include 
`--destination-topic-name` to ensure proper routing.
   
   ### Verifying this change
   This documentation change was verified manually using the updated 
configurations with:
   * Apache Pulsar 4.x.x standalone
   * `pulsar-io-debezium-postgres-4.x.x.nar`
   * PostgreSQL 13.3 (Docker)
   
   **Validation steps:**
   1. Started PostgreSQL with logical replication enabled (`wal_level=logical`).
   2. Ran the documented `localrun` command using the newly updated 
configuration properties.
   3. Performed `INSERT`, `UPDATE`, and `DELETE` operations on the source table.
   4. Confirmed messages were successfully published and consumed from 
`persistent://public/default/dbserver1.public.users`.
   
   Example consumer output successfully captured CDC events:
   ```json
   ----- got message -----
   key:[eyJpZC0=]
   content:{"before":null,"after":{"id":6,"hash_firstname":"initial-rs"},
   "source":{"connector":"postgresql","name":"dbserver1"},
   "op":"c","ts_ms":1776669145647}
   ```
   - "op":"c" for inserts
   - "op":"u" for updates
   - "op":"d" for deletes
   
   ### A Quick Reminder for Your File Edits
   Just to make sure your actual markdown edits match this PR description, 
double-check that the JSON block in the docs file (`docs/io-debezium-source.md` 
or similar) looks exactly like this now:
   
   ```json
   {
       "database.hostname": "localhost",
       "database.port": "5432",
       "database.user": "postgres",
       "database.password": "changeme",
       "database.dbname": "postgres",
       "topic.prefix": "dbserver1",
       "plugin.name": "pgoutput",
       "schema.include.list": "public",
       "table.include.list": "public.users",
       "key.converter": "org.apache.kafka.connect.storage.StringConverter",
       "value.converter": "org.apache.kafka.connect.storage.StringConverter",
       "database.history.pulsar.service.url": "pulsar://127.0.0.1:6650"
   }
   ```
   
   ### Does this pull request potentially affect one of the following parts?
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   - [x] Documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to