Re: What do you think about HDFS using GFS2 (shared disk file system) or GPFS (parallel filesystem) rather than local file system?

Wei-Chiu Chuang Sat, 17 Aug 2019 03:28:56 -0700

Not familiar with GPFS, but looking at IBM's website, GPFS has a client
that emulates Hadoop RPC
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1adv_Overview.htm

So you can just use GPFS like HDFS. It may be the quickest way to approach
this use case and is supported.
Not sure about the performance though.

Looking at Cloudera's user doc
https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_stg_dev_accept_criteria.pdf

*High-throughput Storage Area Network (SAN) and other shared storage
solutions can present remote block devices to virtual machines in a
flexible and performant manner that is often indistinguishable from a local
disk. An Apache Hadoop workload provides a uniquely challenging IO profile
to these storage solutions, and this can have a negative impact on the
utility and stability of the Cloudera Enterprise cluster, and to other work
that is utilizing the same storage backend.*

*Warning: Running CDH on storage platforms other than direct-attached
physical disks can provide suboptimal performance. Cloudera Enterprise and
the majority of the Hadoop platform are optimized to provide high
performance by distributing work across a cluster that can utilize data
locality and fast local I/O.*

On Sat, Aug 17, 2019 at 2:12 AM Daegyu Han <[email protected]> wrote:

> Hi all,
>
> As far as I know, HDFS is designed to target local file systems like ext4
> or xfs.
>
> Is it a bad approach to use SAN technology as storage for HDFS?
>
> Thank you,
> Daegyu
> ᐧ
>

Re: What do you think about HDFS using GFS2 (shared disk file system) or GPFS (parallel filesystem) rather than local file system?

Reply via email to