------- Comment From leona...@ibm.com 2019-01-04 14:29 EDT------- Test: Verify all memory after migration
################### Host: ################### # uname -a Linux host 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux #cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never #cat /proc/cpuinfo [...] processor : 159 cpu : POWER9, altivec supported clock : 2300.000000MHz revision : 2.2 (pvr 004e 1202) timebase : 512000000 platform : PowerNV model : 8375-42A machine : PowerNV 8375-42A firmware : OPAL MMU : Radix As previously, I have built version Qemu 3.1.0 and made sure the patch that enables THP was included: #../configure --target-list=ppc-linux-user,ppc64-linux-user,ppc64le-linux-user,ppc-softmmu,ppc64-softmmu --enable-debug-info --enable-trace-backends=log --python=/usr/bin/python3 && make -j $(nproc)' #./ppc-softmmu/qemu-system-ppc -version QEMU emulator version 3.1.0 (v3.1.0-dirty) ################### Guest: ################### ### CLI 1: Migrating from: MALLOC_PERTURB_=1 /home/leonardo/qemu/build/ppc64-softmmu/qemu-system-ppc64 \ -nographic \ -serial mon:stdio \ -name 'avocado-vt-vm1' \ -machine pseries \ -nodefaults \ -vga std \ -device pci-bridge,id=pci_bridge,bus=pci.0,addr=0x3,chassis_nr=1 \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=0x4 \ -object rng-random,filename=/dev/random,id=passthrough-RHq4nIpF \ -device virtio-rng-pci,id=virtio-rng-pci-aXCni2OX,rng=passthrough-RHq4nIpF,bus=pci.0,addr=0x5 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x6 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x7 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/leonardo/images/ubuntu-18.04-ppc64le.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1 \ -m 8192 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -watchdog i6300esb \ -watchdog-action reset \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 \ -initrd /boot/initrd.img-4.15.0-20-generic \ -kernel /boot/vmlinux-4.15.0-20-generic \ -append "root=UUID=b4ef9412-06d6-4947-9969-f15c7cc2c986 ro quiet splash ### CLI 2: Migrating To Copy of CLI 1, changing: - -name 'avocado-vt-vm1' \ + -name 'avocado-vt-vm2' \ + -S - -vnc :0 \ + -vnc :1 \ + -incoming tcp:0:5801 \ ### Inside Guest: #uname -a Linux localhost 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux # cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never #cat /proc/cpuinfo processor : 3 cpu : POWER9 (architected), altivec supported clock : 2900.000000MHz revision : 2.2 (pvr 004e 1202) timebase : 512000000 platform : pSeries model : IBM pSeries (emulated by qemu) machine : CHRP IBM pSeries (emulated by qemu) MMU : Radix ################### Test Software: ################### I created a simple C file to: - allocate 2MB blocks, - write urandom to them, - md5sum all the blocks together, - stops, allowing migration, - re-md5sum everything, - free the blocks. The attached source file is copied to guest, then compiled: #gcc -o memtest memtest.c -lcrypto ################### Procedure ################### Use CLI commands to bring up Guest "Migrate from" and "Migrate to". On "Migrate from": root@localhost:~# ./memtest Block 0 Block 128 [...] Block 3968 Allocated 4075 blocks of 2097152 size. Md5 = 209a63b9c1f9acd13fa32236229daa9b <Will change each run> Press enter key to check memory integrity <ctrl + z> [1]+ Stopped ./memtest root@localhost:~# free -h total used free shared buff/cache available Mem: 8.0G 7.7G 246M 64K 21M 37M Swap: 758M 758M 0B - Enter Qemu Monitor: <ctrl + a, c > QEMU 3.1.0 monitor - type 'help' for more information (qemu) migrate -d tcp:0:5801 <Wait till completed> (qemu) info status VM status: paused (postmigrate) (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off Migration status: completed total time: 248950 milliseconds downtime: 112 milliseconds setup: 18 milliseconds transferred ram: 9847781 kbytes throughput: 269.52 mbps remaining ram: 0 kbytes total ram: 8405056 kbytes duplicate: 143398 pages skipped: 0 pages normal: 2456826 pages normal bytes: 9827304 kbytes dirty sync count: 7 page size: 4 kbytes multifd bytes: 0 kbytes On "Migrate to": - Enter Qemu Monitor: <ctrl + a, c > (qemu) info status VM status: paused (qemu) cont (qemu) - Exit Qemu Monitor: <ctrl + a, c > root@localhost:~# fg ./teste <press enter> Block 0 Block 128 [...] Block 3968 Freed 4075 blocks of 2097152 size. Md5 = 209a63b9c1f9acd13fa32236229daa9b MD5 match! ################### Results ################### - It allocates (almost) all memory, migrate, verify all memory. - All memory seems to be intact after migration. - I did this test at least 5 times, MD5 matches everytime. ################### NEEDINFO ################### I still could not reproduce the bug. Is there any suggestion on how to reproduce it? Am I missing something? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1788098 Title: Avoid migration issues with aligned 2MB THB Status in The Ubuntu-power-systems project: Incomplete Status in linux package in Ubuntu: In Progress Status in qemu package in Ubuntu: Invalid Status in linux source package in Bionic: In Progress Status in linux source package in Cosmic: In Progress Bug description: FYI: This blocks bug 1781526 - once this one here is resolved we can go on with SRU considerations for 1781526 ------- Comment From jhop...@us.ibm.com 2018-08-20 17:12 EDT------- Hi, in some environments it was observed that this qemu patch to enable THP made it more likely to hit guest migration issues, however the following kernel patch resolves those migration issues: https://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git/commit/?h=kvm-ppc-next&id=c066fafc595eef5ae3c83ae3a8305956b8c3ef15 KVM: PPC: Book3S HV: Use correct pagesize in kvm_unmap_radix() Once merged upstream, it would be good to include that change as well to avoid potential migration problems. Should I open a new bug for that or is it better to track here? Note Paelzer: I have not seen related migration issues myself, but it seems reasonable and confirmed by IBM. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1788098/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp