[ 
https://issues.apache.org/jira/browse/HADOOP-6713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859935#action_12859935
 ] 

Dmytro Molkov commented on HADOOP-6713:
---------------------------------------

We were running performance test on our test cluster.
The test itself is creating a tree of directories with files on the leafs in a 
depths first search fashion: there is a root and we create N directories in the 
root directory for the test, each mapper then starts in one of those 
directories and creates its own subtree with files on the leafs.

Then there is a read job that for each mapper does ls on the directory, chooses 
random element in ls, if it is a directory then repeat if it is a file then do 
read on the file. The files are 4K in size so the read time is small and we are 
mostly hitting the namenode with this job.

We were running the branch that had this fix and it also had read write locks 
for namenode instead of synchronized sections.

The version without fixes could only get namenode to use 175% cpu. With fixes 
in place we were using 750% cpu for read only load (when the second job was 
running on its own and 550% for read-write load when two jobs were running in 
parallel. 

In the read-write mode the ration of reads to writes was 8:1 (800 read clients 
vs 100 write clients).

We are not putting the read-write locks in production in this iteration, seems 
we feel like we need to do more testing on it. As soon as I have some results 
for the branch with this fix only I will post my findings here.

> The RPC server Listener thread is a scalability bottleneck
> ----------------------------------------------------------
>
>                 Key: HADOOP-6713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6713
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>    Affects Versions: 0.21.0
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HADOOP-6713.patch
>
>
> The Hadoop RPC Server implementation has a single Listener thread that reads 
> data from the socket and puts them into a call queue. This means that this 
> single thread can pull RPC requests off the network only as fast as a single 
> CPU can execute. This is a scalability bottlneck in our cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to