[PR] [fix](streaming-job) cdc client PostgreSQL snapshot honors scan.snapshot.fetch.size to avoid wide-table OOM [doris]

via GitHub Mon, 29 Jun 2026 00:27:49 -0700


JNSimba opened a new pull request, #64938:
URL: https://github.com/apache/doris/pull/64938


   ## Proposed changes
   
   ### Problem
   
   For the PostgreSQL streaming source, the snapshot split read used the 
Debezium
   connector config's snapshot fetch size (default `10240`) instead of the
   `scan.snapshot.fetch.size` option (default `1024`). As a result:
   
   1. `scan.snapshot.fetch.size` had no effect for the PostgreSQL source.
   2. When a snapshot chunk's row count was `<=` the fetch size, the JDBC
      server-side cursor returned the whole chunk in a single batch, loading 
every
      row of the chunk into memory at once. On wide tables (many columns / large
      rows) this could exhaust the cdc client heap and OOM, leaving the snapshot
      stuck in an OOM-restart loop.
   
   ### Fix
   
   Thread `PostgresSourceConfig` into the snapshot read task and read
   `sourceConfig.getFetchSize()` for the snapshot select statement, so that
   `scan.snapshot.fetch.size` is honored. This matches how the MySQL source 
already
   wires the snapshot fetch size.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [fix](streaming-job) cdc client PostgreSQL snapshot honors scan.snapshot.fetch.size to avoid wide-table OOM [doris]

Reply via email to