On Tue, Oct 13, 2020 at 1:31 PM Douglas Eadline <deadl...@eadline.org>
wrote:

>
> The reality is almost all Analytics projects require multiple
> tools. For instance, Spark is great, but if you do some
> data munging of CSV files and want to store your results
> at scale you can't write a single file to your local file
> system. Often times you write it as a Hive table to HDFS
> (e.g. in Parquet format) so it is available for Hive SQL
> queries or for other tools to use.
>

You can also commit to a database (but you can't have those running on a
traditional HPC cluster). What would be nice would be HDFS running on a
traditional cluster. But that would break the whole parallel filesystem
exposed as a single mount point thing.... It is funny how these things
evolved apart from each other to the point they are impossible to marry,
no?
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to