On Tue, Oct 13, 2020 at 1:31 PM Douglas Eadline <deadl...@eadline.org> wrote:
> > The reality is almost all Analytics projects require multiple > tools. For instance, Spark is great, but if you do some > data munging of CSV files and want to store your results > at scale you can't write a single file to your local file > system. Often times you write it as a Hive table to HDFS > (e.g. in Parquet format) so it is available for Hive SQL > queries or for other tools to use. > You can also commit to a database (but you can't have those running on a traditional HPC cluster). What would be nice would be HDFS running on a traditional cluster. But that would break the whole parallel filesystem exposed as a single mount point thing.... It is funny how these things evolved apart from each other to the point they are impossible to marry, no?
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf