JNSimba opened a new pull request, #64938:
URL: https://github.com/apache/doris/pull/64938
## Proposed changes
### Problem
For the PostgreSQL streaming source, the snapshot split read used the
Debezium
connector config's snapshot fetch size (default `10240`) instead of the
`scan.snapshot.fetch.size` option (default `1024`). As a result:
1. `scan.snapshot.fetch.size` had no effect for the PostgreSQL source.
2. When a snapshot chunk's row count was `<=` the fetch size, the JDBC
server-side cursor returned the whole chunk in a single batch, loading
every
row of the chunk into memory at once. On wide tables (many columns / large
rows) this could exhaust the cdc client heap and OOM, leaving the snapshot
stuck in an OOM-restart loop.
### Fix
Thread `PostgresSourceConfig` into the snapshot read task and read
`sourceConfig.getFetchSize()` for the snapshot select statement, so that
`scan.snapshot.fetch.size` is honored. This matches how the MySQL source
already
wires the snapshot fetch size.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]