On 7/25/19 8:26 PM, Jörg Saßmannshausen wrote:
Dear all, dear Chris,
thanks for the detailed explanation. We are currently looking into cloud-
bursting so your email was very timely for me as I am suppose to look into it.
One of the issues I can see with our workload is simply getting data into the
cloud and back out again. We are not talking about a few Gigs here, we are
talking up to say 1 or more TB. For reference: we got 9 PB of storage (GPFS)
of which we are currently using 7 PB and there are around 1000+ users
connected to the system. So cloud bursting would only be possible in some
cases.
Do you happen to have a feeling of how to handle the issue with the file sizes
sensibly?
The issue is bursting with large data sets. You might be able to
pre-stage some portion of the data set in a public cloud, and then burst
jobs from there. Data motion between sites is going to be the hard
problem in the mix. Not technically hard, but hard from a cost/time
perspective.
--
Joe Landman
e: joe.land...@gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf