> but for now just expecting to get something good without an effort is probably premature.
Nothing good every came easy. Who said that? My Mum. And she was a very wise woman. On Sun, 9 Dec 2018 at 21:36, INKozin via Beowulf <beowulf@beowulf.org> wrote: > While I agree with many points made so far I want to add that one aspect > which used to separate a typical HPC setup from some IT infrastructure is > complexity. And I don't mean technological complexity (because > technologically HPC can be fairly complex) but the diversity and the > interrelationships between various things. Typically HPC is relatively > homogeneous and straightforward. But everything is changing including HPC > so modularisation is a natural approach to make systems more manageable so > containers, conda, kubernetes etc are solutions to fight complexity. Yes, > these solutions can be fairly complex too but the impact is generally > intentionally restricted. For example, a conda environment can be rather > bloated but then flexibility for size is a reasonable trade-off. > One of the points Werner Vogels, Amazon CTO kept coming back over and over > again in his keynote at the recent reInvent is modular (cellular) > architecture at different levels (lambdas, firecracker, containers, VMs and > up) because working with redundant, replaceable modules makes services > scalable and resilient. > And I'm pretty sure the industry will continue on its path to embrace > microVMs as it did containers before that. > This modular approach may work quite well for on prem IT, cloud or HTC > (High Throughout Computing) but may still be a challenge for HPC because > you can argue that true HPC system must be tightly coupled (e.g. remember > OS jitter?) > As for ML and more specifically deep learning, it depends on what you do. > If you are doing inferencing ie production setup ie more like HTC then > everything works fine. But if you want to train a model on on ImageNet or > larger and do it very quickly (hours) then you will benefit from a tightly > coupled setup (although there are tricks such as asynchronous parameter > updates to alleviate latency) > Two points in case here: Kubeflow whose scaling seems somewhat deficient > and Horovod library which made many people rather excited because it allows > using Tensorflow and MPI. > While Docker and Singularity can be used with MPI, you'd probably want to > trim as much as you can if you want to push the scaling limit. But I think > we've already discussed many times on this list the topic of "heroic" HPC > vs "democratic" HPC (top vs tail). > > Just on last thing regarding using GPUs in the cloud. Last time I checked > even the spot instances were so expensive you'd be so much better off if > you buy them even if for a month. Obviously if you have place to host them. > And obviously in your DC you can use a decent network for faster training. > As for ML services provided by AWS and others, my experience rather > limited. I helped one of our students with ML service on AWS. Initially he > was excited that he could just through his data set at it and get something > out. Alas, he quickly found out that he needs to do quite a bit more so > back to our HPC. Perhaps AutoML will be significantly improved in the > coming years but for now just expecting to get something good without an > effort is probably premature. > > > On Sun, 9 Dec 2018 at 15:26, Gerald Henriksen <ghenr...@gmail.com> wrote: > >> On Fri, 7 Dec 2018 16:19:30 +0100, you wrote: >> >> >Perhaps for another thread: >> >Actually I went t the AWS USer Group in the UK on Wednesday. Ver >> >impressive, and there are the new Lustre filesystems and MPI networking. >> >I guess the HPC World will see the same philosophy of building your setup >> >using the AWS toolkit as Uber etc. etc. do today. >> >Also a lot of noise is being made at the moment about the convergence of >> >HPC and Machine Learning workloads. >> >Are we going to see the MAchine Learning folks adapting their workflows >> to >> >run on HPC on-premise bare metal clusters? >> >Or are we going to see them go off and use AWS (Azure, Google ?) >> >> I suspect that ML will not go for on-premise for a number of reasons. >> >> First, ignoring cost, companies like Google, Amazon and Microsoft are >> very good at ML because not only are they driving the research but >> they need it for their business. So they have the in house expertise >> not only to implement cloud systems that are ideal for ML, but to >> implement custom hardware - see Google's Tensor Processor Unit. >> >> Second, setting up a new cluster isn't going to be easy. Finding >> physical space, making sure enough utilities can be supplied to >> support the hardware, staffing up, etc. are not only going to be >> difficult but inherently takes time when instead you can simply sign >> up to a cloud provider and have the project running within 24 hours. >> Would HPC exist today as we know it if the ability to instantly turn >> on a cluster existed at the beginning? >> >> Third, albeit this is very speculative. I suspect ML learning is >> heading towards using custom hardware. It has had a very good run >> using GPU's, and a GPU will likely always be the entry point for >> desktop ML, but unless Nvidia is holding back due to a lack of >> competition is does appear the GPU is reaching and end to its >> development much like CPUs have. The latest hardware from Nvidia is >> getting lacklustre reviews, and the bolting on of additional things >> like raytracing is perhaps an indication that there are limits to how >> much further the GPU architecture can be pushed. The question then is >> the ML market big enough to have that custom hardware as a OEM product >> like a GPU or will it remain restricted to places like Google who can >> afford to build it without the necessary overheads of a consumer >> product. >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf