Hi,

at a conference last year I asked one of the CRIU developers about IB RDMA 
support for live migrations, and he said it doesn't support it, and no plans 
either.

In a way it makes sense, considering what CRIU does is basically a dump of the 
process memory + some support for kernel-managed objects like file 
descriptions. With RDMA you're basically mapping the NIC HW buffers into your 
process and spray away, so how could that be checkpointed (at that level)?

I'd guess it could theoretically be possible to leverage CRIU to handle the 
rest, and then have the MPI library take care of fixing up the RDMA stuff? 
Though I'm not aware of any effort in this direction.

In addition to the things you listed, there's BLCR, though I have no experience 
with it.

(My (entirely theoretical) interest in this topic is not checkpoint/restart per 
se, but rather using live migrations to reduce job fragmentation, and optimize 
cpu/memory layout etc. But again, I'm not aware of any effort in this 
direction.)

--
Janne Blomqvist

________________________________________
From: Beowulf <beowulf-boun...@beowulf.org> on behalf of Christopher Samuel 
<ch...@csamuel.org>
Sent: Monday, March 4, 2019 9:41:59 PM
To: Beowulf Mailing List
Subject: [Beowulf] Application independent checkpoint/resume?

Hi folks,

Just wondering if folks here have recent experiences here with
application independent checkpoint/resume mechanisms like DMTCP or CRIU?

Especially interested for MPI uses, and extra bonus points for
experiences on Cray. :-)

 From what I can see CRIU doesn't seem to support MPI at all, and DMTCP
only supports it over TCP/IP or (with a supplied plugin) Infiniband. Are
those inferences true?

Any others I've missed?

All the best,
Chris
--
   Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to