On Tue, Oct 13, 2020 at 8:52 AM Guy Coates <guy.coa...@gmail.com> wrote:
>
> Having just spent some time looking at parallelising some ML/AI workloads, it 
> was enlightening to see that as you scratch beneath the various frameworks 
> like pytorch or horovod, you find...MPI. And RDMA. And workloads that can 
> quickly become IO bound.  Plus ca change...

yup, we're working on that too.  we've been watching these for a while
now as the frameworks mature.  We have a lot of models that want to
spread across multiple gpus across multiple hosts.  some of the
frameworks have been less than pleasant in accomplishing this task,
but the ones that took the mpi/gpudirect route seem to be making the
most ground
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Reply via email to