Hi Dejan, I checked on Github and found that DEFAULT_DATA_SOCKET_SIZE locates in the hadoop-hdfs-project/hadoop-hdfs-client/ package in the apache version of Hadoop, whereas hadoop-hdfs-project/hadoop-hdfs/ in that of Hortonworks. I am not sure if that means that parameter affects the performance of Hadoop client in Apache HDFS and the performance of DataNode in HortonWorks HDFS. If that's the fact, maybe it's a bug brought in by HortonWorks?
2016-08-01 17:47 GMT+08:00 Dejan Menges <[email protected]>: > Hi Shady, > > We did extensive tests on this and received fix from Hortonworks which we > are probably first and only to test most likely tomorrow evening. If > Hortonworks guys are reading this maybe they know official HDFS ticket ID > for this, if there is such, as I can not find it in our correspondence. > Long story short - single server had RAID controllers with 1G and 2G cache > (both scenarios were tested). It started just as a simple benchmark test > using TestDFSIO after trying to narrow down best configuration on server > side (discussions like this one, JBOD, RAID0, benchmarking etc). However, > having 10-12 disks in a single server, and mentioned controllers, we got > 6-10 times higher write speed when not using replication (meaning using > replication factor one). Took really months to narrow it down to single > hardcoded value in HdfsConstants.DEFAULT_DATA_SOCKET_SIZE (just looking > into patch). In the > end tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE) > basically limited write speed to this constant when using replication, > which is super annoying (specially in the context where more or less > everyone is using now network speed bigger than 100Mbps). This can be found > in > b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java > > On Mon, Aug 1, 2016 at 11:39 AM Shady Xu <[email protected]> wrote: > >> Thanks Allen. I am aware of the fact you said and am wondering what's the >> await and svctm on your cluster nodes. If there are no signifiant >> difference, maybe I should try other ways to tune my HBase. >> >> And Dejan, I've never heard of or noticed what you said. If that's true >> it's really disappointing and please notice us if there's any progress. >> >> 2016-08-01 15:33 GMT+08:00 Dejan Menges <[email protected]>: >> >>> Sorry for jumping in, but hence performance... it took as a while to >>> figure out why, whatever disk/RAID0 performance you have, when it comes to >>> HDFS and replication factor bigger then zero, disk write speed drops to >>> 100Mbps... After long long tests with Hortonworks they found that issue is >>> that someone at some point in history hardcoded stuff somewhere, and >>> whatever setup you have, you were limited to this. Luckily we have quite >>> powerful testing environment and plan is to test this patch later this >>> week. I'm not sure if there's either official HDFS bug for this, checked >>> our internal history but didn't see anything like that. >>> >>> This was quite disappointing, as whatever tuning, controllers, setups >>> you do, it goes down the water with this. >>> >>> On Mon, Aug 1, 2016 at 8:30 AM Allen Wittenauer <[email protected]> wrote: >>> >>>> >>>> >>>> On 2016-07-30 20:12 (-0700), Shady Xu <[email protected]> wrote: >>>> > Thanks Andrew, I know about the disk failure risk and that it's one >>>> of the >>>> > reasons why we should use JBOD. But JBOD provides worse performance >>>> than >>>> > RAID 0. >>>> >>>> It's not about failure: it's about speed. RAID0 performance will drop >>>> like a rock if any one disk in the set is slow. When all the drives are >>>> performing at peak, yes, it's definitely faster. But over time, drive >>>> speed will decline (sometimes to half speed or less!) usually prior to a >>>> failure. This failure may take a while, so in the mean time your cluster is >>>> getting slower ... and slower ... and slower ... >>>> >>>> As a result, JBOD will be significantly faster over the _lifetime_ of >>>> the disks vs. a comparison made _today_. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>
