Praveenkumar76 opened a new pull request, #1137:
URL: https://github.com/apache/pulsar-site/pull/1137
Fixes apache/pulsar#23662
### Motivation
The documented Debezium PostgreSQL CDC example currently relies on outdated
Debezium 1.x property names (such as `database.server.name`,
`schema.whitelist`, and `table.whitelist`). When users copy and paste these
into modern Pulsar environments using Debezium 2.x, it leads to validation
failures.
Additionally, the documentation does not specify the correct converters.
Without explicitly defining string converters, users experience silent data
drops caused by `KeyValue` schema mismatches, preventing messages from being
published to the Pulsar topic.
### Modifications
Updated the PostgreSQL Debezium source configuration examples (both JSON and
YAML) in the documentation to reflect Debezium 2.x standards:
* Replaced `database.server.name` with `topic.prefix`.
* Replaced `schema.whitelist` with `schema.include.list`.
* Replaced `table.whitelist` with `table.include.list`.
* Added `key.converter` and `value.converter` set to
`org.apache.kafka.connect.storage.StringConverter` to prevent silent data drops
due to schema mismatches.
* (If applicable) Updated the `localrun` CLI command example to include
`--destination-topic-name` to ensure proper routing.
### Verifying this change
This documentation change was verified manually using the updated
configurations with:
* Apache Pulsar 4.x.x standalone
* `pulsar-io-debezium-postgres-4.x.x.nar`
* PostgreSQL 13.3 (Docker)
**Validation steps:**
1. Started PostgreSQL with logical replication enabled (`wal_level=logical`).
2. Ran the documented `localrun` command using the newly updated
configuration properties.
3. Performed `INSERT`, `UPDATE`, and `DELETE` operations on the source table.
4. Confirmed messages were successfully published and consumed from
`persistent://public/default/dbserver1.public.users`.
Example consumer output successfully captured CDC events:
```json
----- got message -----
key:[eyJpZC0=]
content:{"before":null,"after":{"id":6,"hash_firstname":"initial-rs"},
"source":{"connector":"postgresql","name":"dbserver1"},
"op":"c","ts_ms":1776669145647}
```
- "op":"c" for inserts
- "op":"u" for updates
- "op":"d" for deletes
### A Quick Reminder for Your File Edits
Just to make sure your actual markdown edits match this PR description,
double-check that the JSON block in the docs file (`docs/io-debezium-source.md`
or similar) looks exactly like this now:
```json
{
"database.hostname": "localhost",
"database.port": "5432",
"database.user": "postgres",
"database.password": "changeme",
"database.dbname": "postgres",
"topic.prefix": "dbserver1",
"plugin.name": "pgoutput",
"schema.include.list": "public",
"table.include.list": "public.users",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"database.history.pulsar.service.url": "pulsar://127.0.0.1:6650"
}
```
### Does this pull request potentially affect one of the following parts?
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
- [x] Documentation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]