abhijeetk88 commented on code in PR #15625:
URL: https://github.com/apache/kafka/pull/15625#discussion_r1613033249
##########
storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManagerConfig.java:
##########
@@ -143,6 +143,38 @@ public final class RemoteLogManagerConfig {
"less than or equal to `log.retention.bytes` value.";
public static final Long DEFAULT_LOG_LOCAL_RETENTION_BYTES = -2L;
+ public static final String
REMOTE_LOG_MANAGER_COPY_MAX_BYTES_PER_SECOND_PROP =
"remote.log.manager.copy.max.bytes.per.second";
+ public static final String
REMOTE_LOG_MANAGER_COPY_MAX_BYTES_PER_SECOND_DOC = "The maximum number of bytes
that can be copied from local storage to remote storage per second. " +
+ "This is a global limit for all the partitions that are being
copied from remote storage to local storage. " +
+ "The default value is Long.MAX_VALUE, which means there is no
limit on the number of bytes that can be copied per second.";
+ public static final Long
DEFAULT_REMOTE_LOG_MANAGER_COPY_MAX_BYTES_PER_SECOND = Long.MAX_VALUE;
+
+ public static final String REMOTE_LOG_MANAGER_COPY_QUOTA_WINDOW_NUM_PROP =
"remote.log.manager.copy.quota.window.num";
+ public static final String REMOTE_LOG_MANAGER_COPY_QUOTA_WINDOW_NUM_DOC =
"The number of samples to retain in memory for remote copy quota management. " +
+ "The default value is 61, which means there are 60 whole windows +
1 current window.";
+ public static final int DEFAULT_REMOTE_LOG_MANAGER_COPY_QUOTA_WINDOW_NUM =
61;
+
+ public static final String
REMOTE_LOG_MANAGER_COPY_QUOTA_WINDOW_SIZE_SECONDS_PROP =
"remote.log.manager.copy.quota.window.size.seconds";
+ public static final String
REMOTE_LOG_MANAGER_COPY_QUOTA_WINDOW_SIZE_SECONDS_DOC = "The time span of each
sample for remote copy quota management. " +
+ "The default value is 1 second.";
+ public static final int
DEFAULT_REMOTE_LOG_MANAGER_COPY_QUOTA_WINDOW_SIZE_SECONDS = 1;
+
+ public static final String
REMOTE_LOG_MANAGER_FETCH_MAX_BYTES_PER_SECOND_PROP =
"remote.log.manager.fetch.max.bytes.per.second";
+ public static final String
REMOTE_LOG_MANAGER_FETCH_MAX_BYTES_PER_SECOND_DOC = "The maximum number of
bytes that can be fetched from remote storage to local storage per second. " +
+ "This is a global limit for all the partitions that are being
fetched from remote storage to local storage. " +
+ "The default value is Long.MAX_VALUE, which means there is no
limit on the number of bytes that can be fetched per second.";
+ public static final Long
DEFAULT_REMOTE_LOG_MANAGER_FETCH_MAX_BYTES_PER_SECOND = Long.MAX_VALUE;
+
+ public static final String REMOTE_LOG_MANAGER_FETCH_QUOTA_WINDOW_NUM_PROP
= "remote.log.manager.fetch.quota.window.num";
+ public static final String REMOTE_LOG_MANAGER_FETCH_QUOTA_WINDOW_NUM_DOC =
"The number of samples to retain in memory for remote fetch quota management. "
+
+ "The default value is 11, which means there are 10 whole windows +
1 current window.";
+ public static final int DEFAULT_REMOTE_LOG_MANAGER_FETCH_QUOTA_WINDOW_NUM
= 11;
Review Comment:
For fetches, the default window size was chosen to match the default window
size used for other quotas, such as ClientQuota and ReplicationQuota.
Using an 11-second (10 whole + 1 current) window size for copies, similar to
other quotas, does seem to be a better option. Consider this:
The broker-level quota for copying may be set to 250 MBps. The RLM task
records the log segment size with the quota manager when uploading a log
segment. The typical log segment size is 500 MB, meaning only one log segment
can be uploaded every 2 seconds without breaching the quota. If uploads occur
faster, the quota will be exceeded. Therefore, as long as the window size is
greater than 2 seconds, either a 10-second or 60-second (whole) window should
work.
However, a shorter window (10 seconds) has advantages. It tracks data
uploads more precisely and prevents large spikes in data upload more
effectively. For example:
With a 10-second window:
Buckets: b1, b2, ..., b10
In the 10th second, 5 segments can be uploaded without breaching the average
quota (5 * 500 MB / 10 seconds = 250 MBps), though the spike will be 2.5 GB in
that second.
With a 60-second window:
Buckets: b1, b2, ..., b60
In the 60th second, 30 segments can be uploaded without breaching the
average quota (30 * 500 MB / 60 seconds = 250 MBps), but the spike will be 15
GB in that second.
Given the need to avoid quota breaches, a 10-second window is preferable to
a 60-second window.
Let me know if it makes sense. I can change the default copy window to be
the same as the default fetch window.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]