Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-03-15 Thread Huang, Kai
Thanks! Thanks, -Kai On 3/14/2017 5:57 AM, Paolo Bonzini wrote: On 13/03/2017 15:58, fangying wrote: Hi, Huang Kai After weeks of intensive testing, we think the problem is solved and this issue can be closed. Thanks for the update. We got to the same conclusion. Paolo

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-03-13 Thread Paolo Bonzini
On 13/03/2017 15:58, fangying wrote: > Hi, Huang Kai > > After weeks of intensive testing, we think the problem is solved and > this issue can be closed. Thanks for the update. We got to the same conclusion. Paolo

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-03-13 Thread fangying
Hi, Huang Kai After weeks of intensive testing, we think the problem is solved and this issue can be closed. On 2017/2/27 15:38, Huang, Kai wrote: On 2/25/2017 2:44 PM, Herongguang (Stephen) wrote: On 2017/2/24 23:14, Paolo Bonzini wrote: On 24/02/2017 16:10, Chris Friesen wrote: On

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-26 Thread Huang, Kai
On 2/25/2017 2:44 PM, Herongguang (Stephen) wrote: On 2017/2/24 23:14, Paolo Bonzini wrote: On 24/02/2017 16:10, Chris Friesen wrote: On 02/23/2017 08:23 PM, Herongguang (Stephen) wrote: On 2017/2/22 22:43, Paolo Bonzini wrote: Hopefully Gaohuai and Rongguang can help with this too.

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-24 Thread Herongguang (Stephen)
On 2017/2/24 23:14, Paolo Bonzini wrote: On 24/02/2017 16:10, Chris Friesen wrote: On 02/23/2017 08:23 PM, Herongguang (Stephen) wrote: On 2017/2/22 22:43, Paolo Bonzini wrote: Hopefully Gaohuai and Rongguang can help with this too. Paolo Yes, we are looking into and testing this. I

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-24 Thread Paolo Bonzini
On 24/02/2017 16:10, Chris Friesen wrote: > On 02/23/2017 08:23 PM, Herongguang (Stephen) wrote: > >> On 2017/2/22 22:43, Paolo Bonzini wrote: > >>> Hopefully Gaohuai and Rongguang can help with this too. >>> >>> Paolo >> >> Yes, we are looking into and testing this. >> >> I think this can resu

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-24 Thread Chris Friesen
On 02/23/2017 08:23 PM, Herongguang (Stephen) wrote: On 2017/2/22 22:43, Paolo Bonzini wrote: Hopefully Gaohuai and Rongguang can help with this too. Paolo . Yes, we are looking into and testing this. I think this can result in any memory corruption, if VM1 writes its PML buffer into VM2

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-24 Thread Greg KH
On Fri, Feb 24, 2017 at 11:00:32AM +0100, Paolo Bonzini wrote: > > > On 24/02/2017 10:59, Greg KH wrote: > > On Fri, Feb 24, 2017 at 05:35:17PM +0800, Herongguang (Stephen) wrote: > >> > >> > >> On 2017/2/24 10:23, Herongguang (Stephen) wrote: > >>> > >>> > >>> On 2017/2/22 22:43, Paolo Bonzini w

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-24 Thread Paolo Bonzini
On 24/02/2017 10:59, Greg KH wrote: > On Fri, Feb 24, 2017 at 05:35:17PM +0800, Herongguang (Stephen) wrote: >> >> >> On 2017/2/24 10:23, Herongguang (Stephen) wrote: >>> >>> >>> On 2017/2/22 22:43, Paolo Bonzini wrote: On 22/02/2017 14:31, Chris Friesen wrote: >>> >> >

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-24 Thread Greg KH
On Fri, Feb 24, 2017 at 05:35:17PM +0800, Herongguang (Stephen) wrote: > > > On 2017/2/24 10:23, Herongguang (Stephen) wrote: > > > > > > On 2017/2/22 22:43, Paolo Bonzini wrote: > > > > > > > > > On 22/02/2017 14:31, Chris Friesen wrote: > > > > > > > > > > > > > > > > Can you reproduce it

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-24 Thread Herongguang (Stephen)
On 2017/2/24 10:23, Herongguang (Stephen) wrote: On 2017/2/22 22:43, Paolo Bonzini wrote: On 22/02/2017 14:31, Chris Friesen wrote: Can you reproduce it with kernel 4.8+? I'm suspecting commmit 4e59516a12a6 ("kvm: vmx: ensure VMCS is current while enabling PML", 2016-07-14) to be the

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-23 Thread Herongguang (Stephen)
On 2017/2/22 22:43, Paolo Bonzini wrote: On 22/02/2017 14:31, Chris Friesen wrote: Can you reproduce it with kernel 4.8+? I'm suspecting commmit 4e59516a12a6 ("kvm: vmx: ensure VMCS is current while enabling PML", 2016-07-14) to be the fix. I can't easily try with a newer kernel, the s

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-22 Thread Chris Friesen
On 02/22/2017 05:15 AM, Paolo Bonzini wrote: On 22/02/2017 04:08, Chris Friesen wrote: On 02/19/2017 10:38 PM, Han, Huaitong wrote: Hi, Gaohuai I tried to debug the problem, and I found the indirect cause may be that the rmap value is not cleared when KVM mmu page is freed. I have read code

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-22 Thread Paolo Bonzini
On 22/02/2017 14:31, Chris Friesen wrote: >>> >> >> Can you reproduce it with kernel 4.8+? I'm suspecting commmit >> 4e59516a12a6 ("kvm: vmx: ensure VMCS is current while enabling PML", >> 2016-07-14) to be the fix. > > I can't easily try with a newer kernel, the software package we're using >

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-22 Thread Paolo Bonzini
On 22/02/2017 04:08, Chris Friesen wrote: > On 02/19/2017 10:38 PM, Han, Huaitong wrote: >> Hi, Gaohuai >> >> I tried to debug the problem, and I found the indirect cause may be that >> the rmap value is not cleared when KVM mmu page is freed. I have read >> code without the root cause. Can you s

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-21 Thread Chris Friesen
On 02/19/2017 10:38 PM, Han, Huaitong wrote: Hi, Gaohuai I tried to debug the problem, and I found the indirect cause may be that the rmap value is not cleared when KVM mmu page is freed. I have read code without the root cause. Can you stable reproduce the the issue? Many guesses need to be ver

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-19 Thread Han, Huaitong
Hi, Gaohuai I tried to debug the problem, and I found the indirect cause may be that the rmap value is not cleared when KVM mmu page is freed. I have read code without the root cause. Can you stable reproduce the the issue? Many guesses need to be verified. On Mon, 2017-02-20 at 10:17 +0800, han

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-19 Thread hangaohuai
Hi, Kai Huang and Xiao Guangrong. For the problem mentioned above, there may be a bug related to PML and probably on Broadwell CPUs. I've been reading the code for PML for days, but I haven't found any clews. Do you have any idea about this BUG ? Hope you can help! On 2017/2/10 23:28, Chris F

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-13 Thread hangaohuai
Hi, Chris Friesen. We notice that you can reliably trigger the BUG during the live-migration stress test, however we can't. Could you descripe your test steps so that we can re-trigger the BUG and get more information about it ? On 2017/2/10 23:28, Chris Friesen wrote: > > Well, not so much sol

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-10 Thread Paolo Bonzini
> Well, not so much solved as worked around it. > > It seems that the problem only showed up on Broadwell, which made us wonder > about something hardware specific. Yes, the bug seems to be related to PML. I am looking at the code, but I haven't found anything yet. Paolo

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-10 Thread Chris Friesen
Well, not so much solved as worked around it. It seems that the problem only showed up on Broadwell, which made us wonder about something hardware specific. Setting "kvm-intel.eptad=0" in the kernel boot args seems to mask the problem for us. Chris On 02/10/2017 03:11 AM, Herongguang (Ste

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-10 Thread Herongguang (Stephen)
Hi, Chris Friesen, did you solve the problem? On 2017/2/9 22:37, Herongguang (Stephen) wrote: Hi. I had a problem when I just repeatedly live migrate a vm between two compute nodes. The phenomenon was that the KVM module was crashed and then the host rebooted. However I cannot reliably trigger

Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-10 Thread Herongguang (Stephen)
Hi, Chris Friesen, did you solve the problem? On 2017/2/9 22:37, Herongguang (Stephen) wrote: Hi. I had a problem when I just repeatedly live migrate a vm between two compute nodes. The phenomenon was that the KVM module was crashed and then the host rebooted. However I cannot reliably trigger

[Qemu-devel] kvm bug in __rmap_clear_dirty during live migration

2017-02-09 Thread Herongguang (Stephen)
Hi. I had a problem when I just repeatedly live migrate a vm between two compute nodes. The phenomenon was that the KVM module was crashed and then the host rebooted. However I cannot reliably trigger this BUG. The backtrace is the same as http://www.spinics.net/lists/kvm/msg138475.html. The cr