> On 22 Mar 2021, at 15:23, Otto Moerbeek <o...@drijf.net> wrote: > On Mon, Mar 22, 2021 at 03:20:37PM +0100, Mischa wrote: >>> On 22 Mar 2021, at 15:18, Otto Moerbeek <o...@drijf.net> wrote: >>> On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote: >>> >>>>> On 22 Mar 2021, at 15:05, Dave Voutila <d...@sisu.io> wrote: >>>>> Otto Moerbeek writes: >>>>>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote: >>>>>>> Otto Moerbeek writes: >>>>>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote: >>>>>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson <s...@spacehopper.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it. >>>>>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and >>>>>>>>>>>> waiting 240 seconds after each cycle. >>>>>>>>>>>> Similar to the staggered start based on the amount of CPUs. >>>>>>>>>> >>>>>>>>>>> For me this is not enough info to even try to reproduce, I know >>>>>>>>>>> little >>>>>>>>>>> of vmm or vmd and have no idea what "derive" means in this context. >>>>>>>>>> >>>>>>>>>> This is a big bit of information that was missing from the original >>>>>>>>> >>>>>>>>> Well.. could have been better described indeed. :)) >>>>>>>>> " I created 41 additional VMs based on a single qcow2 base image.” >>>>>>>>> >>>>>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing >>>>>>>>>> file') which can be shared between VMs, with writes diverted to a >>>>>>>>>> separate image ('derived image'). >>>>>>>>>> >>>>>>>>>> So e.g. you can create a base image, do a simple OS install for a >>>>>>>>>> particular OS version to that base image, then you stop using that >>>>>>>>>> for a VM and just use it as a base to create derived images from. >>>>>>>>>> You then run VMs using the derived image and make whatever config >>>>>>>>>> changes. If you have a bunch of VMs using the same OS release then >>>>>>>>>> you save some disk space for the common files. >>>>>>>>>> >>>>>>>>>> Mischa did you leave a VM running which is working on the base >>>>>>>>>> image directly? That would certainly cause problems. >>>>>>>>> >>>>>>>>> I did indeed. Let me try that again without keeping the base image >>>>>>>>> running. >>>>>>>> >>>>>>>> Right. As a safeguard, I would change the base image to be r/o. >>>>>>> >>>>>>> vmd(8) should treating it r/o...the config process is responsible for >>>>>>> opening the disk files and passing the fd's to the vm process. In >>>>>>> config.c, the call to open(2) for the base images should be using the >>>>>>> flags O_RDONLY | O_NONBLOCK. >>>>>>> >>>>>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new >>>>>>> disk image I based off the "alpine.qcow2" image: >>>>>>> >>>>>>> 20862 vmd CALL >>>>>>> open(0x7f7ffffd4370,0x26<O_RDWR|O_NONBLOCK|O_EXLOCK>) >>>>>>> 20862 vmd NAMI "/home/dave/vm/new.qcow2" >>>>>>> 20862 vmd RET open 10/0xa >>>>>>> 20862 vmd CALL fstat(10,0x7f7ffffd42b8) >>>>>>> 20862 vmd STRU struct stat { dev=1051, ino=19531847, >>>>>>> mode=-rw------- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, >>>>>>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, >>>>>>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, >>>>>>> ctime=1616420697<"Mar 22 09:44:57 2021">.189185158, size=262144, >>>>>>> blocks=256, blksize=32768, flags=0x0, gen=0xb64d5d98 } >>>>>>> 20862 vmd RET fstat 0 >>>>>>> 20862 vmd CALL kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c) >>>>>>> 20862 vmd RET kbind 0 >>>>>>> 20862 vmd CALL pread(10,0x7f7ffffd42a8,0x68,0) >>>>>>> 20862 vmd GIO fd 10 read 104 bytes >>>>>>> >>>>>>> "QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\ >>>>>>> >>>>>>> \0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\ >>>>>>> >>>>>>> \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^D\0\ >>>>>>> \0\0h" >>>>>>> 20862 vmd RET pread 104/0x68 >>>>>>> 20862 vmd CALL pread(10,0x7f7ffffd4770,0xc,0x68) >>>>>>> 20862 vmd GIO fd 10 read 12 bytes >>>>>>> "alpine.qcow2" >>>>>>> 20862 vmd RET pread 12/0xc >>>>>>> 20862 vmd CALL kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c) >>>>>>> 20862 vmd RET kbind 0 >>>>>>> 20862 vmd CALL kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c) >>>>>>> 20862 vmd RET kbind 0 >>>>>>> 20862 vmd CALL __realpath(0x7f7ffffd3ea0,0x7f7ffffd3680) >>>>>>> 20862 vmd NAMI "/home/dave/vm/alpine.qcow2" >>>>>>> 20862 vmd NAMI "/home/dave/vm/alpine.qcow2" >>>>>>> 20862 vmd RET __realpath 0 >>>>>>> 20862 vmd CALL open(0x7f7ffffd4370,0x4<O_RDONLY|O_NONBLOCK>) >>>>>>> 20862 vmd NAMI "/home/dave/vm/alpine.qcow2" >>>>>>> 20862 vmd RET open 11/0xb >>>>>>> 20862 vmd CALL fstat(11,0x7f7ffffd42b8) >>>>>>> >>>>>>> >>>>>>> I'm more familiar with the vmd(8) codebase than any ffs stuff, but I >>>>>>> don't think the issue is the base image being r/w. >>>>>>> >>>>>>> -Dave >>>>>> >>>>>> AFAIKS, the issue is that if you start a vm modifying the base because it >>>>>> uses it as a regular image, that r/o open for the other vms does not >>>>>> matter a lot, >>>>>> >>>>>> -OPtto >>>>> >>>>> Good point. I'm going to look into the feasibility of having the >>>>> control[1] process track what disks it's opened and in what mode to see >>>>> if there's a way to build in some protection against this from >>>>> happening. >>>>> >>>>> [1] I mistakenly called it the "config" process earlier. >>>> >>>> I guess that would help a lot of poor souls like myself to not make that >>>> mistake again. :) >>>> >>>> Mischa >>>> >>> >>> BTW, is was testign 40 1G VMs on a host with 24G, but some of the VMs >>> died on me when the machine started hitting swap. Is this known? >> >> Yes… been there done that got the t-shirt. :) >> >> Also there is a TLB flush patch in -current which Mike added, which means >> you shouldn’t oversubscribe memory at all. >> >> Mischa >> > > Ugh, I'll get back to using real metal hw....
:)) Mischa