[
https://issues.apache.org/jira/browse/HADOOP-11354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhao updated HADOOP-11354:
-------------------------------
Resolution: Fixed
Fix Version/s: 2.7.0
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)
I've committed this to trunk and branch-2. Thanks for the fix [~tedyu]!
> ThrottledInputStream doesn't perform effective throttling
> ---------------------------------------------------------
>
> Key: HADOOP-11354
> URL: https://issues.apache.org/jira/browse/HADOOP-11354
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Fix For: 2.7.0
>
> Attachments: mapreduce-6180-001.patch
>
>
> This was first reported in HBASE-12632 by [~Tobi] :
> I just transferred a ton of data using ExportSnapshot with bandwidth
> throttling from one Hadoop cluster to another Hadoop cluster, and discovered
> that ThrottledInputStream does not limit bandwidth.
> The problem is that ThrottledInputStream sleeps once, for a fixed time (50
> ms), at the start of each read call, disregarding the actual amount of data
> read.
> ExportSnapshot defaults to a buffer size as big as the block size of the
> outputFs:
> {code:java}
> // Use the default block size of the outputFs if bigger
> int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(),
> BUFFER_SIZE);
> bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize);
> LOG.info("Using bufferSize=" +
> StringUtils.humanReadableInt(bufferSize));
> {code}
> In my case, this was 256MB.
> Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time,
> each time sleeping only 50ms. Thus, in the worst case where each call to read
> fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot
> reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s.
> Even in a more realistic case where read returns about 1 MB per call, it
> still cannot throttle the bandwidth to under 20 MB/s.
> The issue is exacerbated by the fact that you need to set a low limit because
> the total bandwidth per host depends on the number of mapper slots as well.
> A simple solution would change the if in throttle to a while, so that it
> keeps sleeping for 50 ms until the rate is finally low enough:
> {code:java}
> private void throttle() throws IOException {
> while (getBytesPerSec() > maxBytesPerSec) {
> try {
> Thread.sleep(SLEEP_DURATION_MS);
> totalSleepTime += SLEEP_DURATION_MS;
> } catch (InterruptedException e) {
> throw new IOException("Thread aborted", e);
> }
> }
> }
> {code}
> This issue affects the ThrottledInputStream in hadoop as well.
> Another way to see this is that for big enough buffer sizes,
> ThrottledInputStream will be throttling only the number of read calls to 20
> per second, disregarding the number of bytes read.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)