Hi, at a conference last year I asked one of the CRIU developers about IB RDMA support for live migrations, and he said it doesn't support it, and no plans either.
In a way it makes sense, considering what CRIU does is basically a dump of the process memory + some support for kernel-managed objects like file descriptions. With RDMA you're basically mapping the NIC HW buffers into your process and spray away, so how could that be checkpointed (at that level)? I'd guess it could theoretically be possible to leverage CRIU to handle the rest, and then have the MPI library take care of fixing up the RDMA stuff? Though I'm not aware of any effort in this direction. In addition to the things you listed, there's BLCR, though I have no experience with it. (My (entirely theoretical) interest in this topic is not checkpoint/restart per se, but rather using live migrations to reduce job fragmentation, and optimize cpu/memory layout etc. But again, I'm not aware of any effort in this direction.) -- Janne Blomqvist ________________________________________ From: Beowulf <beowulf-boun...@beowulf.org> on behalf of Christopher Samuel <ch...@csamuel.org> Sent: Monday, March 4, 2019 9:41:59 PM To: Beowulf Mailing List Subject: [Beowulf] Application independent checkpoint/resume? Hi folks, Just wondering if folks here have recent experiences here with application independent checkpoint/resume mechanisms like DMTCP or CRIU? Especially interested for MPI uses, and extra bonus points for experiences on Cray. :-) From what I can see CRIU doesn't seem to support MPI at all, and DMTCP only supports it over TCP/IP or (with a supplied plugin) Infiniband. Are those inferences true? Any others I've missed? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf