Re: [Beowulf] Dirty COW fix for RHEL 6.x

2016-10-27 Thread Christopher Samuel
On 28/10/16 15:30, Greg Lindahl wrote: > This is a good time to remind everyone that CentOS 6.7 isn't the same > as RHEL 6.7. When you pay Red Hat, they'll provide security patches > when you stay with non-tip point releases. To be fair you have to pay Red Hat *extra* if you want patches for the

Re: [Beowulf] Dirty COW fix for RHEL 6.x

2016-10-27 Thread Greg Lindahl
On Thu, Oct 27, 2016 at 04:06:26PM -0700, Kilian Cavalotti wrote: > 2. the only supported release is the latest one, meaning that if you > have to stay on say CentOS 6.7 for whatever reason, you don't get a > kernel with the Dirty COW fix. And that is obviously a problem. This is a good time to r

[Beowulf] Dirty COW fix for RHEL 6.x

2016-10-27 Thread Kilian Cavalotti
Dear Beowulfers, There's no doubt everybody is well aware of CVE-2016-5195 by now, and that all your systems have been patched several days ago. I mean, all the systems that run a supported LTS distribution. Because, in case you're like me, you may have to maintain some clusters at specific point

Re: [Beowulf] non-stop computing

2016-10-27 Thread Christopher Samuel
On 28/10/16 00:57, Michael Di Domenico wrote: > i was intrigued by Joe's suggestion of snapshot'ing kvm instances. i > might look into that as an academic exercise. i knew you could > pause/snapshot/resume an instance, but i've never tried to resume a > saved off snapshot, only restart one. if

Re: [Beowulf] non-stop computing

2016-10-27 Thread Luc Vereecken
Hi Michael, Keep us informed if you pull that off... I'm interested in that functionality as well, for similar reasons. For what it's worth, on the torque mailing list I remember that somebody had a script for instantiating and destroying a VM on job start/end. Can't remember who or what, bu

Re: [Beowulf] non-stop computing

2016-10-27 Thread Guy Coates
BLCR or DMTCP should both be able to checkpoint a single node job (single or multi threaded) straight out of the box; you won't need to recompile any of your binaries. DMTCP does not require any kernel modules, and so you might find that easier going if you are on a more recent kernel than BLCR su

Re: [Beowulf] non-stop computing

2016-10-27 Thread Michael Di Domenico
On Thu, Oct 27, 2016 at 10:54 AM, Justin Y. Shi wrote: > Snapshot restart would only work for you if your application leaves > restarting points on the disk. Otherwise restarting the snapshot is the same > as restarting the program. the program does not checkpoint to disk. my cursory look throug

Re: [Beowulf] non-stop computing

2016-10-27 Thread Justin Y. Shi
Snapshot restart would only work for you if your application leaves restarting points on the disk. Otherwise restarting the snapshot is the same as restarting the program. Justin On Thu, Oct 27, 2016 at 9:57 AM, Michael Di Domenico wrote: > thanks for the insights. comedic levity included... :

Re: [Beowulf] non-stop computing

2016-10-27 Thread Michael Di Domenico
thanks for the insights. comedic levity included... :) running the job twice is likely going to be our solution. it's painful when you have multiple people running multiple jobs, in that it wastes resources, but such is life. i was intrigued by Joe's suggestion of snapshot'ing kvm instances. i