Hi Jorg, What kind of data are you dealing with Structured data or unstructured.
Regards, Jonathan -----Original Message----- From: Jörg Saßmannshausen <sassy-w...@sassy.formativ.net> Sent: Friday, 26 July 2019 02:27 To: beowulf@beowulf.org; Jonathan Aquilina <jaquil...@eagleeyet.net> Subject: Re: [Beowulf] Lustre on google cloud Dear all, dear Chris, thanks for the detailed explanation. We are currently looking into cloud- bursting so your email was very timely for me as I am suppose to look into it. One of the issues I can see with our workload is simply getting data into the cloud and back out again. We are not talking about a few Gigs here, we are talking up to say 1 or more TB. For reference: we got 9 PB of storage (GPFS) of which we are currently using 7 PB and there are around 1000+ users connected to the system. So cloud bursting would only be possible in some cases. Do you happen to have a feeling of how to handle the issue with the file sizes sensibly? Sorry for hijacking the thread here a bit. All the best from a hot London Jörg Am Montag, 22. Juli 2019, 14:14:13 BST schrieb Chris Dagdigian: > A lot of production HPC runs on cloud systems. > > AWS is big for this via their AWS Parallelcluster stack which does > include lustre support via vfXT for lustre service although they are > careful to caveat it as staging/scratch space not suitable for > persistant storage. AWS has some cool node types now with 25gig, > 50gig and 100-gigabit network support. > > Microsoft Azure is doing amazing things now that they have the > cyclecomputing folks on board, integrated and able to call shots > within the product space. They actually offer bare metal HPC and > infiniband SKUs now and have some interesting parallel filesystem offerings > as well. > > Can't comment on google as I've not touched or used it professionally > but AWS and Azure for sure are real players now to consider if you > have an HPC requirement. > > > That said, however, a sober cost accounting still shows on-prem or > "owned' HPC is best from a financial perspective if your workload is > 24x7x365 constant. The cloud based HPC is best for capability, > bursty workloads, temporary workloads, auto-scaling, computing against > cloud-resident data sets or the neat new model where instead of > on-prem multi-user shared HPC you go out and decide to deliver > individual bespoke HPC clusters to each user or team on the cloud. > > The big paradigm shift for cloud HPC is that it does not make a lot of > sense to make a monolithic stack shared by multiple competing users > and groups. The automated provisioning and elasticity of the cloud > make it more sensible to build many clusters so that you can tune each > cluster specifically for the cluster or workload and then blow it up > when the work is done. > > My $.02 of course! > > Chris > > > Jonathan Aquilina <mailto:jaquil...@eagleeyet.net> July 22, 2019 at > > 1:48 PM > > > > Hi Guys, > > > > I am looking at > > https://cloud.google.com/blog/products/storage-data-transfer/introdu > > cing-l ustre-file-system-cloud-deployment-manager-scripts > > > > This basically allows you to deploy a lustre cluster on google cloud. > > In your HPC setups have you considered moving towards cloud based > > clusters? > > > > Regards, > > > > Jonathan > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin > > Computing To change your subscription (digest mode or unsubscribe) > > visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf