github-actions[bot] commented on code in PR #64511:
URL: https://github.com/apache/doris/pull/64511#discussion_r3418793146
##########
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/ConfigUtil.java:
##########
@@ -123,9 +124,35 @@ public static ZoneId
getPostgresServerTimeZoneFromProps(java.util.Properties pro
return ZoneId.systemDefault();
}
+ public static final String MAX_QUEUE_BYTES_SYS_PROP =
"cdc.max.queue.size.in.bytes";
+
+ // Heap-adaptive byte cap for the debezium ChangeEventQueue buffer.
+ // heap 1G->64MB, 2G->128MB, >=4G->256MB. -D<MAX_QUEUE_BYTES_SYS_PROP>
overrides
+ // (<=0 disables); a malformed override is logged and ignored, falling
back to the cap.
+ private static long resolveMaxQueueSizeInBytes() {
+ String override = System.getProperty(MAX_QUEUE_BYTES_SYS_PROP);
+ if (override != null) {
+ try {
+ long bytes = Long.parseLong(override.trim());
+ return bytes <= 0 ? 0 : bytes;
+ } catch (NumberFormatException e) {
+ LOG.warn(
+ "Ignoring invalid -D{}={}, expected an integer byte
count; "
+ + "falling back to the adaptive cap",
+ MAX_QUEUE_BYTES_SYS_PROP,
+ override);
+ }
+ }
+ long target = Runtime.getRuntime().maxMemory() / 16;
+ return Math.max(64L * 1024 * 1024, Math.min(target, 256L * 1024 *
1024));
Review Comment:
This only caps Debezium's `ChangeEventQueue`; the exact snapshot-backfill
path still drains that queue into
`IncrementalSourceScanFetcher.pollWithBuffer()`'s `outputBuffer` until the
split reaches its high-watermark/end-watermark. For the TVF/default snapshot
path (`skip_snapshot_backfill` absent, so false), a split with 8192 rows at
~2MB each will be polled in several <=64-256MB queue chunks, but all ~16GB can
still accumulate in the `HashMap` before any records are returned, and
`snapshot_parallelism` can multiply that. That leaves the wide-row snapshot OOM
scenario described by the PR unresolved. Please either enforce a byte bound in
the snapshot output buffer/split sizing, or scope this cap to paths where
records are streamed out instead of fully buffered.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]