> On 22 Mar 2021, at 15:18, Otto Moerbeek <o...@drijf.net> wrote:
> 
> On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote:
> 
>>> On 22 Mar 2021, at 15:05, Dave Voutila <d...@sisu.io> wrote:
>>> Otto Moerbeek writes:
>>>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote:
>>>>> Otto Moerbeek writes:
>>>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
>>>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson <s...@spacehopper.org> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
>>>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and 
>>>>>>>>>> waiting 240 seconds after each cycle.
>>>>>>>>>> Similar to the staggered start based on the amount of CPUs.
>>>>>>>> 
>>>>>>>>> For me this is not enough info to even try to reproduce, I know little
>>>>>>>>> of vmm or vmd and have no idea what "derive" means in this context.
>>>>>>>> 
>>>>>>>> This is a big bit of information that was missing from the original
>>>>>>> 
>>>>>>> Well.. could have been better described indeed. :))
>>>>>>> " I created 41 additional VMs based on a single qcow2 base image.”
>>>>>>> 
>>>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing
>>>>>>>> file') which can be shared between VMs, with writes diverted to a
>>>>>>>> separate image ('derived image').
>>>>>>>> 
>>>>>>>> So e.g. you can create a base image, do a simple OS install for a
>>>>>>>> particular OS version to that base image, then you stop using that
>>>>>>>> for a VM and just use it as a base to create derived images from.
>>>>>>>> You then run VMs using the derived image and make whatever config
>>>>>>>> changes. If you have a bunch of VMs using the same OS release then
>>>>>>>> you save some disk space for the common files.
>>>>>>>> 
>>>>>>>> Mischa did you leave a VM running which is working on the base
>>>>>>>> image directly? That would certainly cause problems.
>>>>>>> 
>>>>>>> I did indeed. Let me try that again without keeping the base image 
>>>>>>> running.
>>>>>> 
>>>>>> Right. As a safeguard, I would change the base image to be r/o.
>>>>> 
>>>>> vmd(8) should treating it r/o...the config process is responsible for
>>>>> opening the disk files and passing the fd's to the vm process. In
>>>>> config.c, the call to open(2) for the base images should be using the
>>>>> flags O_RDONLY | O_NONBLOCK.
>>>>> 
>>>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new
>>>>> disk image I based off the "alpine.qcow2" image:
>>>>> 
>>>>> 20862 vmd      CALL  open(0x7f7ffffd4370,0x26<O_RDWR|O_NONBLOCK|O_EXLOCK>)
>>>>> 20862 vmd      NAMI  "/home/dave/vm/new.qcow2"
>>>>> 20862 vmd      RET   open 10/0xa
>>>>> 20862 vmd      CALL  fstat(10,0x7f7ffffd42b8)
>>>>> 20862 vmd      STRU  struct stat { dev=1051, ino=19531847, 
>>>>> mode=-rw------- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, 
>>>>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, 
>>>>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, ctime=1616420697<"Mar 
>>>>> 22 09:44:57 2021">.189185158, size=262144, blocks=256, blksize=32768, 
>>>>> flags=0x0, gen=0xb64d5d98 }
>>>>> 20862 vmd      RET   fstat 0
>>>>> 20862 vmd      CALL  kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c)
>>>>> 20862 vmd      RET   kbind 0
>>>>> 20862 vmd      CALL  pread(10,0x7f7ffffd42a8,0x68,0)
>>>>> 20862 vmd      GIO   fd 10 read 104 bytes
>>>>>      
>>>>> "QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\
>>>>>       
>>>>> \0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\
>>>>>       
>>>>> \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^D\0\
>>>>>       \0\0h"
>>>>> 20862 vmd      RET   pread 104/0x68
>>>>> 20862 vmd      CALL  pread(10,0x7f7ffffd4770,0xc,0x68)
>>>>> 20862 vmd      GIO   fd 10 read 12 bytes
>>>>>      "alpine.qcow2"
>>>>> 20862 vmd      RET   pread 12/0xc
>>>>> 20862 vmd      CALL  kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c)
>>>>> 20862 vmd      RET   kbind 0
>>>>> 20862 vmd      CALL  kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c)
>>>>> 20862 vmd      RET   kbind 0
>>>>> 20862 vmd      CALL  __realpath(0x7f7ffffd3ea0,0x7f7ffffd3680)
>>>>> 20862 vmd      NAMI  "/home/dave/vm/alpine.qcow2"
>>>>> 20862 vmd      NAMI  "/home/dave/vm/alpine.qcow2"
>>>>> 20862 vmd      RET   __realpath 0
>>>>> 20862 vmd      CALL  open(0x7f7ffffd4370,0x4<O_RDONLY|O_NONBLOCK>)
>>>>> 20862 vmd      NAMI  "/home/dave/vm/alpine.qcow2"
>>>>> 20862 vmd      RET   open 11/0xb
>>>>> 20862 vmd      CALL  fstat(11,0x7f7ffffd42b8)
>>>>> 
>>>>> 
>>>>> I'm more familiar with the vmd(8) codebase than any ffs stuff, but I
>>>>> don't think the issue is the base image being r/w.
>>>>> 
>>>>> -Dave
>>>> 
>>>> AFAIKS, the issue is that if you start a vm modifying the base because it
>>>> uses it as a regular image, that r/o open for the other vms does not
>>>> matter a lot,
>>>> 
>>>>    -OPtto
>>> 
>>> Good point. I'm going to look into the feasibility of having the
>>> control[1] process track what disks it's opened and in what mode to see
>>> if there's a way to build in some protection against this from
>>> happening.
>>> 
>>> [1] I mistakenly called it the "config" process earlier.
>> 
>> I guess that would help a lot of poor souls like myself to not make that 
>> mistake again. :)
>> 
>> Mischa
>> 
> 
> BTW, is was testign 40 1G VMs on a host with 24G, but some of the VMs
> died on me when the machine started hitting swap. Is this known? 

Yes… been there done that got the t-shirt. :)

Also there is a TLB flush patch in -current which Mike added, which means you 
shouldn’t oversubscribe memory at all.

Mischa

Reply via email to