Not familiar with GPFS, but looking at IBM's website, GPFS has a client that emulates Hadoop RPC https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.1/com.ibm.spectrum.scale.v4r21.doc/bl1adv_Overview.htm
So you can just use GPFS like HDFS. It may be the quickest way to approach this use case and is supported. Not sure about the performance though. Looking at Cloudera's user doc https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_stg_dev_accept_criteria.pdf *High-throughput Storage Area Network (SAN) and other shared storage solutions can present remote block devices to virtual machines in a flexible and performant manner that is often indistinguishable from a local disk. An Apache Hadoop workload provides a uniquely challenging IO profile to these storage solutions, and this can have a negative impact on the utility and stability of the Cloudera Enterprise cluster, and to other work that is utilizing the same storage backend.* *Warning: Running CDH on storage platforms other than direct-attached physical disks can provide suboptimal performance. Cloudera Enterprise and the majority of the Hadoop platform are optimized to provide high performance by distributing work across a cluster that can utilize data locality and fast local I/O.* On Sat, Aug 17, 2019 at 2:12 AM Daegyu Han <[email protected]> wrote: > Hi all, > > As far as I know, HDFS is designed to target local file systems like ext4 > or xfs. > > Is it a bad approach to use SAN technology as storage for HDFS? > > Thank you, > Daegyu > ᐧ >
