Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

Shady Xu Tue, 02 Aug 2016 00:38:23 -0700

Hi Dejan,

I checked on Github and found that DEFAULT_DATA_SOCKET_SIZE locates in the
hadoop-hdfs-project/hadoop-hdfs-client/ package in the apache version of
Hadoop, whereas hadoop-hdfs-project/hadoop-hdfs/ in that of Hortonworks.
I am not sure if that means that parameter affects the performance of
Hadoop client in Apache HDFS and the performance of DataNode in HortonWorks
HDFS. If that's the fact, maybe it's a bug brought in by HortonWorks?


2016-08-01 17:47 GMT+08:00 Dejan Menges <[email protected]>:

> Hi Shady,
>
> We did extensive tests on this and received fix from Hortonworks which we
> are probably first and only to test most likely tomorrow evening. If
> Hortonworks guys are reading this maybe they know official HDFS ticket ID
> for this, if there is such, as I can not find it in our correspondence.
> Long story short - single server had RAID controllers with 1G and 2G cache
> (both scenarios were tested). It started just as a simple benchmark test
> using TestDFSIO after trying to narrow down best configuration on server
> side (discussions like this one, JBOD, RAID0, benchmarking etc). However,
> having 10-12 disks in a single server, and mentioned controllers, we got
> 6-10 times higher write speed when not using replication (meaning using
> replication factor one). Took really months to narrow it down to single
> hardcoded value in HdfsConstants.DEFAULT_DATA_SOCKET_SIZE (just looking
> into patch). In the
> end tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE)
> basically limited write speed to this constant when using replication,
> which is super annoying (specially in the context where more or less
> everyone is using now network speed bigger than 100Mbps). This can be found
> in 
> b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
>
> On Mon, Aug 1, 2016 at 11:39 AM Shady Xu <[email protected]> wrote:
>
>> Thanks Allen. I am aware of the fact you said and am wondering what's the
>> await and svctm on your cluster nodes. If there are no signifiant
>> difference, maybe I should try other ways to tune my HBase.
>>
>> And Dejan, I've never heard of or noticed what you said. If that's true
>> it's really disappointing and please notice us if there's any progress.
>>
>> 2016-08-01 15:33 GMT+08:00 Dejan Menges <[email protected]>:
>>
>>> Sorry for jumping in, but hence performance... it took as a while to
>>> figure out why, whatever disk/RAID0 performance you have, when it comes to
>>> HDFS and replication factor bigger then zero, disk write speed drops to
>>> 100Mbps... After long long tests with Hortonworks they found that issue is
>>> that someone at some point in history hardcoded stuff somewhere, and
>>> whatever setup you have, you were limited to this. Luckily we have quite
>>> powerful testing environment and plan is to test this patch later this
>>> week. I'm not sure if there's either official HDFS bug for this, checked
>>> our internal history but didn't see anything like that.
>>>
>>> This was quite disappointing, as whatever tuning, controllers, setups
>>> you do, it goes down the water with this.
>>>
>>> On Mon, Aug 1, 2016 at 8:30 AM Allen Wittenauer <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On 2016-07-30 20:12 (-0700), Shady Xu <[email protected]> wrote:
>>>> > Thanks Andrew, I know about the disk failure risk and that it's one
>>>> of the
>>>> > reasons why we should use JBOD. But JBOD provides worse performance
>>>> than
>>>> > RAID 0.
>>>>
>>>> It's not about failure: it's about speed.  RAID0 performance will drop
>>>> like a rock if any one disk in the set is slow. When all the drives are
>>>> performing at peak, yes, it's definitely faster.  But over time, drive
>>>> speed will decline (sometimes to half speed or less!) usually prior to a
>>>> failure. This failure may take a while, so in the mean time your cluster is
>>>> getting slower ... and slower ... and slower ...
>>>>
>>>> As a result, JBOD will be significantly faster over the _lifetime_ of
>>>> the disks vs. a comparison made _today_.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>

Re: Supervisely, RAID0 provides best io performance whereas no RAID the worst

Reply via email to