Re: [Beowulf] HPC workflows

Douglas O'Flaherty Sun, 09 Dec 2018 13:46:10 -0800


> On Dec 9, 2018, at 7:26 AM, Gerald Henriksen <ghenr...@gmail.com> wrote:
> 
>> On Fri, 7 Dec 2018 16:19:30 +0100, you wrote:
>> 
>> Perhaps for another thread:
>> Actually I went t the AWS USer Group in the UK on Wednesday. Ver
>> impressive, and there are the new Lustre filesystems and MPI networking.
>> I guess the HPC World will see the same philosophy of building your setup
>> using the AWS toolkit as Uber etc. etc. do today.
>> Also a lot of noise is being made at the moment about the convergence of
>> HPC and Machine Learning workloads.
>> Are we going to see the MAchine Learning folks adapting their workflows to
>> run on HPC on-premise bare metal clusters?
>> Or are we going to see them go off and use AWS (Azure, Google ?)
> 
> I suspect that ML will not go for on-premise for a number of reasons.
> 
> First, ignoring cost, companies like Google, Amazon and Microsoft are
> very good at ML because not only are they driving the research but
> they need it for their business.  So they have the in house expertise
> not only to implement cloud systems that are ideal for ML, but to
> implement custom hardware - see Google's Tensor Processor Unit.
> 
> Second, setting up a new cluster isn't going to be easy.  Finding
> physical space, making sure enough utilities can be supplied to
> support the hardware, staffing up, etc.  are not only going to be
> difficult but inherently takes time when instead you can simply sign
> up to a cloud provider and have the project running within 24 hours.
> Would HPC exist today as we know it if the ability to instantly turn
> on a cluster existed at the beginning?
> 
> Third, albeit this is very speculative.  I suspect ML learning is
> heading towards using custom hardware.  It has had a very good run
> using GPU's, and a GPU will likely always be the entry point for
> desktop ML, but unless Nvidia is holding back due to a lack of
> competition is does appear the GPU is reaching and end to its
> development much like CPUs have.  The latest hardware from Nvidia is
> getting lacklustre reviews, and the bolting on of additional things
> like raytracing is perhaps an indication that there are limits to how
> much further the GPU architecture can be pushed.  The question then is
> the ML market big enough to have that custom hardware as a OEM product
> like a GPU or will it remain restricted to places like Google who can
> afford to build it without the necessary overheads of a consumer
> product.
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf


My data points are the opposite. 

1. As it progresses from experiment to real use, most AI/ML/DL is taking place 
near where the data is. Since for many that data is on-premises, that is 
on-premises.  For cloud services, it stays on the cloud.

2. The investment isn’t huge and is incremental, so there isn’t a strong 
barrier to buying the kit. 
Models never get ‘finished’ and require regular retesting on historical and new 
data, so they can keep it busy. The GPUs are plenty good enough because most of 
the frameworks parallelize (scale-out) easily.   There is also a desire to test 
models on other similar data, but that data takes prep and a common data 
source. The cost of this sized dedicated storage is not prohibitive, but moving 
from/to the cloud can be.  Most projects start very small to prove 
effectiveness. It isn’t a big tender to get started - unless you are doing 
Autonomous Driving... 

3. There will be specialized solutions for inference, but that isn’t the same 
as training. IMHO, the specialized silicon or designs will be driven by using 
the AI near the edge within the constraints of power, footprint, etc. Training 
will still be scale-out & centralized. GPUs will still work for a long time, 
just like CPUs did. 

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] HPC workflows

Reply via email to