[jira] [Updated] (HADOOP-11354) ThrottledInputStream doesn't perform effective throttling

Jing Zhao (JIRA) Mon, 08 Dec 2014 11:12:58 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-11354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jing Zhao updated HADOOP-11354:
-------------------------------
       Resolution: Fixed
    Fix Version/s: 2.7.0
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks for the fix [~tedyu]!

> ThrottledInputStream doesn't perform effective throttling
> ---------------------------------------------------------
>
>                 Key: HADOOP-11354
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11354
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>             Fix For: 2.7.0
>
>         Attachments: mapreduce-6180-001.patch
>
>
> This was first reported in HBASE-12632 by [~Tobi] :
> I just transferred a ton of data using ExportSnapshot with bandwidth 
> throttling from one Hadoop cluster to another Hadoop cluster, and discovered 
> that ThrottledInputStream does not limit bandwidth.
> The problem is that ThrottledInputStream sleeps once, for a fixed time (50 
> ms), at the start of each read call, disregarding the actual amount of data 
> read.
> ExportSnapshot defaults to a buffer size as big as the block size of the 
> outputFs:
> {code:java}
>       // Use the default block size of the outputFs if bigger
>       int defaultBlockSize = Math.max((int) outputFs.getDefaultBlockSize(), 
> BUFFER_SIZE);
>       bufferSize = conf.getInt(CONF_BUFFER_SIZE, defaultBlockSize);
>       LOG.info("Using bufferSize=" + 
> StringUtils.humanReadableInt(bufferSize));
> {code}
> In my case, this was 256MB.
> Hence, the ExportSnapshot mapper will attempt to read up to 256 MB at a time, 
> each time sleeping only 50ms. Thus, in the worst case where each call to read 
> fills the 256 MB buffer in negligible time, the ThrottledInputStream cannot 
> reduce the bandwidth to under (256 MB) / (5 ms) = 5 GB/s.
> Even in a more realistic case where read returns about 1 MB per call, it 
> still cannot throttle the bandwidth to under 20 MB/s.
> The issue is exacerbated by the fact that you need to set a low limit because 
> the total bandwidth per host depends on the number of mapper slots as well.
> A simple solution would change the if in throttle to a while, so that it 
> keeps sleeping for 50 ms until the rate is finally low enough:
> {code:java}
>   private void throttle() throws IOException {
>     while (getBytesPerSec() > maxBytesPerSec) {
>       try {
>         Thread.sleep(SLEEP_DURATION_MS);
>         totalSleepTime += SLEEP_DURATION_MS;
>       } catch (InterruptedException e) {
>         throw new IOException("Thread aborted", e);
>       }
>     }
>   }
> {code}
> This issue affects the ThrottledInputStream in hadoop as well.
> Another way to see this is that for big enough buffer sizes, 
> ThrottledInputStream will be throttling only the number of read calls to 20 
> per second, disregarding the number of bytes read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11354) ThrottledInputStream doesn't perform effective throttling

Reply via email to