On Mon, Mar 22, 2021 at 03:20:37PM +0100, Mischa wrote:

> 
> 
> > On 22 Mar 2021, at 15:18, Otto Moerbeek <o...@drijf.net> wrote:
> > 
> > On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote:
> > 
> >>> On 22 Mar 2021, at 15:05, Dave Voutila <d...@sisu.io> wrote:
> >>> Otto Moerbeek writes:
> >>>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote:
> >>>>> Otto Moerbeek writes:
> >>>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
> >>>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson <s...@spacehopper.org> 
> >>>>>>>> wrote:
> >>>>>>>> 
> >>>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
> >>>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and 
> >>>>>>>>>> waiting 240 seconds after each cycle.
> >>>>>>>>>> Similar to the staggered start based on the amount of CPUs.
> >>>>>>>> 
> >>>>>>>>> For me this is not enough info to even try to reproduce, I know 
> >>>>>>>>> little
> >>>>>>>>> of vmm or vmd and have no idea what "derive" means in this context.
> >>>>>>>> 
> >>>>>>>> This is a big bit of information that was missing from the original
> >>>>>>> 
> >>>>>>> Well.. could have been better described indeed. :))
> >>>>>>> " I created 41 additional VMs based on a single qcow2 base image.”
> >>>>>>> 
> >>>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing
> >>>>>>>> file') which can be shared between VMs, with writes diverted to a
> >>>>>>>> separate image ('derived image').
> >>>>>>>> 
> >>>>>>>> So e.g. you can create a base image, do a simple OS install for a
> >>>>>>>> particular OS version to that base image, then you stop using that
> >>>>>>>> for a VM and just use it as a base to create derived images from.
> >>>>>>>> You then run VMs using the derived image and make whatever config
> >>>>>>>> changes. If you have a bunch of VMs using the same OS release then
> >>>>>>>> you save some disk space for the common files.
> >>>>>>>> 
> >>>>>>>> Mischa did you leave a VM running which is working on the base
> >>>>>>>> image directly? That would certainly cause problems.
> >>>>>>> 
> >>>>>>> I did indeed. Let me try that again without keeping the base image 
> >>>>>>> running.
> >>>>>> 
> >>>>>> Right. As a safeguard, I would change the base image to be r/o.
> >>>>> 
> >>>>> vmd(8) should treating it r/o...the config process is responsible for
> >>>>> opening the disk files and passing the fd's to the vm process. In
> >>>>> config.c, the call to open(2) for the base images should be using the
> >>>>> flags O_RDONLY | O_NONBLOCK.
> >>>>> 
> >>>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new
> >>>>> disk image I based off the "alpine.qcow2" image:
> >>>>> 
> >>>>> 20862 vmd      CALL  
> >>>>> open(0x7f7ffffd4370,0x26<O_RDWR|O_NONBLOCK|O_EXLOCK>)
> >>>>> 20862 vmd      NAMI  "/home/dave/vm/new.qcow2"
> >>>>> 20862 vmd      RET   open 10/0xa
> >>>>> 20862 vmd      CALL  fstat(10,0x7f7ffffd42b8)
> >>>>> 20862 vmd      STRU  struct stat { dev=1051, ino=19531847, 
> >>>>> mode=-rw------- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, 
> >>>>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, 
> >>>>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, 
> >>>>> ctime=1616420697<"Mar 22 09:44:57 2021">.189185158, size=262144, 
> >>>>> blocks=256, blksize=32768, flags=0x0, gen=0xb64d5d98 }
> >>>>> 20862 vmd      RET   fstat 0
> >>>>> 20862 vmd      CALL  kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c)
> >>>>> 20862 vmd      RET   kbind 0
> >>>>> 20862 vmd      CALL  pread(10,0x7f7ffffd42a8,0x68,0)
> >>>>> 20862 vmd      GIO   fd 10 read 104 bytes
> >>>>>      
> >>>>> "QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\
> >>>>>       
> >>>>> \0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\
> >>>>>       
> >>>>> \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^D\0\
> >>>>>       \0\0h"
> >>>>> 20862 vmd      RET   pread 104/0x68
> >>>>> 20862 vmd      CALL  pread(10,0x7f7ffffd4770,0xc,0x68)
> >>>>> 20862 vmd      GIO   fd 10 read 12 bytes
> >>>>>      "alpine.qcow2"
> >>>>> 20862 vmd      RET   pread 12/0xc
> >>>>> 20862 vmd      CALL  kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c)
> >>>>> 20862 vmd      RET   kbind 0
> >>>>> 20862 vmd      CALL  kbind(0x7f7ffffd39d8,24,0x2a9349e63ae9950c)
> >>>>> 20862 vmd      RET   kbind 0
> >>>>> 20862 vmd      CALL  __realpath(0x7f7ffffd3ea0,0x7f7ffffd3680)
> >>>>> 20862 vmd      NAMI  "/home/dave/vm/alpine.qcow2"
> >>>>> 20862 vmd      NAMI  "/home/dave/vm/alpine.qcow2"
> >>>>> 20862 vmd      RET   __realpath 0
> >>>>> 20862 vmd      CALL  open(0x7f7ffffd4370,0x4<O_RDONLY|O_NONBLOCK>)
> >>>>> 20862 vmd      NAMI  "/home/dave/vm/alpine.qcow2"
> >>>>> 20862 vmd      RET   open 11/0xb
> >>>>> 20862 vmd      CALL  fstat(11,0x7f7ffffd42b8)
> >>>>> 
> >>>>> 
> >>>>> I'm more familiar with the vmd(8) codebase than any ffs stuff, but I
> >>>>> don't think the issue is the base image being r/w.
> >>>>> 
> >>>>> -Dave
> >>>> 
> >>>> AFAIKS, the issue is that if you start a vm modifying the base because it
> >>>> uses it as a regular image, that r/o open for the other vms does not
> >>>> matter a lot,
> >>>> 
> >>>>  -OPtto
> >>> 
> >>> Good point. I'm going to look into the feasibility of having the
> >>> control[1] process track what disks it's opened and in what mode to see
> >>> if there's a way to build in some protection against this from
> >>> happening.
> >>> 
> >>> [1] I mistakenly called it the "config" process earlier.
> >> 
> >> I guess that would help a lot of poor souls like myself to not make that 
> >> mistake again. :)
> >> 
> >> Mischa
> >> 
> > 
> > BTW, is was testign 40 1G VMs on a host with 24G, but some of the VMs
> > died on me when the machine started hitting swap. Is this known? 
> 
> Yes… been there done that got the t-shirt. :)
> 
> Also there is a TLB flush patch in -current which Mike added, which means you 
> shouldn’t oversubscribe memory at all.
> 
> Mischa
> 

Ugh, I'll get back to using real metal hw....

        -otto

Reply via email to