Hi Shady, Great point, didn't know it. Thanks a lot, will definitely check if this was only related to HWX distribution.
Thanks a lot, and sorry if I spammed this topic, it wasn't my intention at all. Dejan On Tue, Aug 2, 2016 at 9:37 AM Shady Xu <[email protected]> wrote: > Hi Dejan, > > I checked on Github and found that DEFAULT_DATA_SOCKET_SIZE locates in > the hadoop-hdfs-project/hadoop-hdfs-client/ package in the apache version > of Hadoop, whereas hadoop-hdfs-project/hadoop-hdfs/ in that of > Hortonworks. I am not sure if that means that parameter affects the > performance of Hadoop client in Apache HDFS and the performance of DataNode > in HortonWorks HDFS. If that's the fact, maybe it's a bug brought in by > HortonWorks? > > 2016-08-01 17:47 GMT+08:00 Dejan Menges <[email protected]>: > >> Hi Shady, >> >> We did extensive tests on this and received fix from Hortonworks which we >> are probably first and only to test most likely tomorrow evening. If >> Hortonworks guys are reading this maybe they know official HDFS ticket ID >> for this, if there is such, as I can not find it in our correspondence. >> Long story short - single server had RAID controllers with 1G and 2G cache >> (both scenarios were tested). It started just as a simple benchmark test >> using TestDFSIO after trying to narrow down best configuration on server >> side (discussions like this one, JBOD, RAID0, benchmarking etc). However, >> having 10-12 disks in a single server, and mentioned controllers, we got >> 6-10 times higher write speed when not using replication (meaning using >> replication factor one). Took really months to narrow it down to single >> hardcoded value in HdfsConstants.DEFAULT_DATA_SOCKET_SIZE (just looking >> into patch). In the >> end >> tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE) >> basically limited write speed to this constant when using replication, >> which is super annoying (specially in the context where more or less >> everyone is using now network speed bigger than 100Mbps). This can be found >> in >> b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java >> >> On Mon, Aug 1, 2016 at 11:39 AM Shady Xu <[email protected]> wrote: >> >>> Thanks Allen. I am aware of the fact you said and am wondering what's >>> the await and svctm on your cluster nodes. If there are no signifiant >>> difference, maybe I should try other ways to tune my HBase. >>> >>> And Dejan, I've never heard of or noticed what you said. If that's true >>> it's really disappointing and please notice us if there's any progress. >>> >>> 2016-08-01 15:33 GMT+08:00 Dejan Menges <[email protected]>: >>> >>>> Sorry for jumping in, but hence performance... it took as a while to >>>> figure out why, whatever disk/RAID0 performance you have, when it comes to >>>> HDFS and replication factor bigger then zero, disk write speed drops to >>>> 100Mbps... After long long tests with Hortonworks they found that issue is >>>> that someone at some point in history hardcoded stuff somewhere, and >>>> whatever setup you have, you were limited to this. Luckily we have quite >>>> powerful testing environment and plan is to test this patch later this >>>> week. I'm not sure if there's either official HDFS bug for this, checked >>>> our internal history but didn't see anything like that. >>>> >>>> This was quite disappointing, as whatever tuning, controllers, setups >>>> you do, it goes down the water with this. >>>> >>>> On Mon, Aug 1, 2016 at 8:30 AM Allen Wittenauer <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On 2016-07-30 20:12 (-0700), Shady Xu <[email protected]> wrote: >>>>> > Thanks Andrew, I know about the disk failure risk and that it's one >>>>> of the >>>>> > reasons why we should use JBOD. But JBOD provides worse performance >>>>> than >>>>> > RAID 0. >>>>> >>>>> It's not about failure: it's about speed. RAID0 performance will drop >>>>> like a rock if any one disk in the set is slow. When all the drives are >>>>> performing at peak, yes, it's definitely faster. But over time, drive >>>>> speed will decline (sometimes to half speed or less!) usually prior to a >>>>> failure. This failure may take a while, so in the mean time your cluster >>>>> is >>>>> getting slower ... and slower ... and slower ... >>>>> >>>>> As a result, JBOD will be significantly faster over the _lifetime_ of >>>>> the disks vs. a comparison made _today_. >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>> >
