> On 22 Mar 2021, at 15:23, Otto Moerbeek <o...@drijf.net> wrote:
> On Mon, Mar 22, 2021 at 03:20:37PM +0100, Mischa wrote:
>>> On 22 Mar 2021, at 15:18, Otto Moerbeek <o...@drijf.net> wrote:
>>> On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote:
>>> 
>>>>> On 22 Mar 2021, at 15:05, Dave Voutila <d...@sisu.io> wrote:
>>>>> Otto Moerbeek writes:
>>>>>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote:
>>>>>>> Otto Moerbeek writes:
>>>>>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
>>>>>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson <s...@spacehopper.org> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
>>>>>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and 
>>>>>>>>>>>> waiting 240 seconds after each cycle.
>>>>>>>>>>>> Similar to the staggered start based on the amount of CPUs.
>>>>>>>>>> 
>>>>>>>>>>> For me this is not enough info to even try to reproduce, I know 
>>>>>>>>>>> little
>>>>>>>>>>> of vmm or vmd and have no idea what "derive" means in this context.
>>>>>>>>>> 
>>>>>>>>>> This is a big bit of information that was missing from the original
>>>>>>>>> 
>>>>>>>>> Well.. could have been better described indeed. :))
>>>>>>>>> " I created 41 additional VMs based on a single qcow2 base image.”
>>>>>>>>> 
>>>>>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing
>>>>>>>>>> file') which can be shared between VMs, with writes diverted to a
>>>>>>>>>> separate image ('derived image').
>>>>>>>>>> 
>>>>>>>>>> So e.g. you can create a base image, do a simple OS install for a
>>>>>>>>>> particular OS version to that base image, then you stop using that
>>>>>>>>>> for a VM and just use it as a base to create derived images from.
>>>>>>>>>> You then run VMs using the derived image and make whatever config
>>>>>>>>>> changes. If you have a bunch of VMs using the same OS release then
>>>>>>>>>> you save some disk space for the common files.
>>>>>>>>>> 
>>>>>>>>>> Mischa did you leave a VM running which is working on the base
>>>>>>>>>> image directly? That would certainly cause problems.
>>>>>>>>> 
>>>>>>>>> I did indeed. Let me try that again without keeping the base image 
>>>>>>>>> running.
>>>>>>>> 
>>>>>>>> Right. As a safeguard, I would change the base image to be r/o.
>>>>>>> 
>>>>>>> vmd(8) should treating it r/o...the config process is responsible for
>>>>>>> opening the disk files and passing the fd's to the vm process. In
>>>>>>> config.c, the call to open(2) for the base images should be using the
>>>>>>> flags O_RDONLY | O_NONBLOCK.
>>>>>>> 
>>>>>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new
>>>>>>> disk image I based off the "alpine.qcow2" image:
>>>>>>> 
>>>>>>> 20862 vmd      CALL  
>>>>>>> open(0x7f7ffffd4370,0x26<O_RDWR|O_NONBLOCK|O_EXLOCK>)
>>>>>>> 20862 vmd      NAMI  "/home/dave/vm/new.qcow2"
>>>>>>> 20862 vmd      RET   open 10/0xa
>>>>>>> 20862 vmd      CALL  fstat(10,0x7f7ffffd42b8)
>>>>>>> 20862 vmd      STRU  struct stat { dev=1051, ino=19531847, 
>>>>>>> mode=-rw------- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, 
>>>>>>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, 
>>>>>>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, 
>>>>>>> ctime=1616420697<"Mar 22 09:44:57 2021">.189185158, size=262144, 
>>>>>>> blocks=256, blksize=32768, flags=0x0, gen=0xb64d5d98 }
>>>>>>> 20862 vmd      RET   fstat 0
>>>>>>> 20862 vmd      CALL  kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c)
>>>>>>> 20862 vmd      RET   kbind 0
>>>>>>> 20862 vmd      CALL  pread(10,0x7f7ffffd42a8,0x68,0)
>>>>>>> 20862 vmd      GIO   fd 10 read 104 bytes
>>>>>>>     
>>>>>>> "QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\
>>>>>>>      
>>>>>>> \0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\
>>>>>>>      
>>>>>>> \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^D\0\
>>>>>>>      \0\0h"
>>>>>>> 20862 vmd      RET   pread 104/0x68
>>>>>>> 20862 vmd      CALL  pread(10,0x7f7ffffd4770,0xc,0x68)
>>>>>>> 20862 vmd      GIO   fd 10 read 12 bytes
>>>>>>>     "alpine.qcow2"
>>>>>>> 20862 vmd      RET   pread 12/0xc
>>>>>>> 20862 vmd      CALL  kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c)
>>>>>>> 20862 vmd      RET   kbind 0
>>>>>>> 20862 vmd      CALL  kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c)
>>>>>>> 20862 vmd      RET   kbind 0
>>>>>>> 20862 vmd      CALL  __realpath(0x7f7ffffd3ea0,0x7f7ffffd3680)
>>>>>>> 20862 vmd      NAMI  "/home/dave/vm/alpine.qcow2"
>>>>>>> 20862 vmd      NAMI  "/home/dave/vm/alpine.qcow2"
>>>>>>> 20862 vmd      RET   __realpath 0
>>>>>>> 20862 vmd      CALL  open(0x7f7ffffd4370,0x4<O_RDONLY|O_NONBLOCK>)
>>>>>>> 20862 vmd      NAMI  "/home/dave/vm/alpine.qcow2"
>>>>>>> 20862 vmd      RET   open 11/0xb
>>>>>>> 20862 vmd      CALL  fstat(11,0x7f7ffffd42b8)
>>>>>>> 
>>>>>>> 
>>>>>>> I'm more familiar with the vmd(8) codebase than any ffs stuff, but I
>>>>>>> don't think the issue is the base image being r/w.
>>>>>>> 
>>>>>>> -Dave
>>>>>> 
>>>>>> AFAIKS, the issue is that if you start a vm modifying the base because it
>>>>>> uses it as a regular image, that r/o open for the other vms does not
>>>>>> matter a lot,
>>>>>> 
>>>>>>  -OPtto
>>>>> 
>>>>> Good point. I'm going to look into the feasibility of having the
>>>>> control[1] process track what disks it's opened and in what mode to see
>>>>> if there's a way to build in some protection against this from
>>>>> happening.
>>>>> 
>>>>> [1] I mistakenly called it the "config" process earlier.
>>>> 
>>>> I guess that would help a lot of poor souls like myself to not make that 
>>>> mistake again. :)
>>>> 
>>>> Mischa
>>>> 
>>> 
>>> BTW, is was testign 40 1G VMs on a host with 24G, but some of the VMs
>>> died on me when the machine started hitting swap. Is this known? 
>> 
>> Yes… been there done that got the t-shirt. :)
>> 
>> Also there is a TLB flush patch in -current which Mike added, which means 
>> you shouldn’t oversubscribe memory at all.
>> 
>> Mischa
>> 
> 
> Ugh, I'll get back to using real metal hw....

:))

Mischa

Reply via email to