> On 22 Mar 2021, at 13:08, Otto Moerbeek <o...@drijf.net> wrote: > On Mon, Mar 22, 2021 at 11:34:25AM +0100, Mischa wrote: > >>> On 21 Mar 2021, at 02:31, Theo de Raadt <dera...@openbsd.org> wrote: >>> Otto Moerbeek <o...@drijf.net> wrote: >>>> On Fri, Mar 19, 2021 at 04:15:31PM +0000, Stuart Henderson wrote: >>>> >>>>> On 2021/03/19 17:05, Jan Klemkow wrote: >>>>>> Hi, >>>>>> >>>>>> I had the same issue a few days ago a server hardware of mine. I just >>>>>> ran 'cvs up'. So, it looks like a generic bug in FFS and not related to >>>>>> vmm. >>>>> >>>>> This panic generally relates to filesystem corruption. If fsck doesn't >>>>> help then recreating which filesystem is triggering it is usually needed. >>>> >>>> Yeah, once in a while we see reports of it. It seems to be some nasty >>>> conspiracy between the generic filesystem code, ffs and fsck_ffs. >>>> Maybe even the device (driver) itself is involved. A possible >>>> underlying issue may be that some operation are re-ordered while they >>>> should not. >>> >>> Yes, it does hint at a reordering. >>> >>>> Now the strange thing is, fsck_ffs *should* be able to repair the >>>> inconsistency, but it appears in some cases it is not, and some bits >>>> on the disk remain to trigger it again. >>> >>> fsck_ffs can only repair one inconsistancy. There are a number of lockstep >>> operations, I suppose we can call them acid-in-lowercase, which allow fsck >>> to determine at which point the crashed system gave up the ghost. fsck then >>> removes the partial operations, leaving a viable filesystem. But if the >>> disk >>> layer lands later writes but not earlier writes, fsck cannot handle it. >> >> I managed to re-create the issue. >> >> Created a fresh install qcow2 image and derived 35 new VMs from it. >> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 240 >> seconds after each cycle. >> Similar to the staggered start based on the amount of CPUs. >> >> This time it was “only” one VM that was affected by this. VM four that got >> started. >> >> ddb> show panic >> ffs_valloc: dup alloc >> ddb> trace >> db_enter() at db_enter+0x10 >> panic(ffffffff81dc21b2) at panic+0x12a >> ffs_inode_alloc(fffffd803c94ef00,81a4,fffffd803f7bbf00,ffff800014d728b8) at >> ffs >> _inode_alloc+0x442 >> ufs_makeinode(81a4,fffffd803c930908,ffff800014d72bb0,ffff800014d72c00) at >> ufs_m >> akeinode+0x7f >> ufs_create(ffff800014d72960) at ufs_create+0x3c >> VOP_CREATE(fffffd803c930908,ffff800014d72bb0,ffff800014d72c00,ffff800014d729c0) >> at VOP_CREATE+0x4a >> vn_open(ffff800014d72b80,602,1a4) at vn_open+0x182 >> doopenat(ffff8000ffffc778,ffffff9c,f8fc28f00f4,601,1b6,ffff800014d72d80) at >> doo >> penat+0x1d0 >> syscall(ffff800014d72df0) at syscall+0x315 >> Xsyscall() at Xsyscall+0x128 >> end of kernel >> end trace frame: 0x7f7ffffbe450, count: -10 >> >> dmesg of the host below. > > For me this is not enough info to even try to reproduce, I know little > of vmm or vmd and have no idea what "derive" means in this context. > > Would it be possiblet for you to show the exact steps (preferably a > script) to reproduce the issue?
Hopefully the below helps. If you do have vmd running create a VM (qcow2 format) with the normal installation process. The base image I created with: vmctl create -s 50G /var/vmm/vm01.qcow2 I have dhcp setup so all the subsequent images will be able to pickup a different IP address. Once that is done replicate the vm.conf config for all the other VMs. The config I used for the VMs is something like: vm "vm01" { disable owner runbsd memory 1G disk "/var/vmm/vm01.qcow2" format qcow2 interface tap { switch "uplink_veb911" lladdr fe:e1:bb:d4:d4:01 } } I replicate them by running something like: for i in $(jot 39 2); do vmctl create -b /var/vmm/vm01.qcow2 /var/vmm/vm${i}.qcow2; done This will use vm01.qcow2 image as the base and create a derived image from it. Which means only changes will be applied to the new image. I start them with the following script: #!/bin/sh SLEEP=240 CPU=$(($(sysctl -n hw.ncpuonline)-2)) COUNTER=0 for i in $(vmctl show | sort | awk '/ - / {print $9}' | xargs); do VMS[${COUNTER}]=${i} COUNTER=$((${COUNTER}+1)) done CYCLES=$((${#VMS[*]}/${CPU}+1)) echo "Starting ${#VMS[*]} VMs on ${CPU} CPUs in ${CYCLES} cycle(s), waiting ${SLEEP} seconds after each cycle." COUNTER=0 for i in ${VMS[*]}; do COUNTER=$((${COUNTER}+1)) vmctl start ${i} if [ $COUNTER -eq $CPU ]; then sleep ${SLEEP} COUNTER=0 fi done This to make sure they are “settled” and all processes are properly started before starting the next batch of VMs. > Though the specific hardware might play a role as well… I can also provide you access to the host itself. Mischa