Re: [Qemu-devel] Crashing in tcp_close

2016-11-14 Thread Stefan Hajnoczi
On Sun, Nov 13, 2016 at 11:55:16AM +, Brian Candler wrote: > On 12/11/2016 10:44, Samuel Thibault wrote: > > Oops, sorry, my patch was completely bogus, here is a proper one. > > Excellent. > > I've run the original build process 18 times (each run takes about 25 > minutes) without valgrind,

Re: [Qemu-devel] Crashing in tcp_close

2016-11-13 Thread Brian Candler
On 12/11/2016 10:44, Samuel Thibault wrote: Oops, sorry, my patch was completely bogus, here is a proper one. Excellent. I've run the original build process 18 times (each run takes about 25 minutes) without valgrind, and it hasn't crashed once. So this looks good. Thank you! Regards, Bri

Re: [Qemu-devel] Crashing in tcp_close

2016-11-12 Thread Samuel Thibault
Hello, Brian Candler, on Sat 12 Nov 2016 09:33:55 +, wrote: > On 11/11/2016 22:09, Samuel Thibault wrote: > >Ooh, I see. Now it's obvious, now that it's not coming from the tcb > >loop:) Could you try the attached patch? > > It looks like it now goes into an infinite loop when a connection

Re: [Qemu-devel] Crashing in tcp_close

2016-11-12 Thread Brian Candler
On 12/11/2016 09:33, Brian Candler wrote: So I sent a SIGABRT, here is the backtrace: And here is some state from the core dump: (gdb) print so $1 = (struct socket *) 0x564b181fc940 (gdb) print *so $2 = {so_next = 0x564b18258c60, so_prev = 0x564b181fcb00, canary1 = -559038737, s = 28, pollf

Re: [Qemu-devel] Crashing in tcp_close

2016-11-12 Thread Brian Candler
On 11/11/2016 22:09, Samuel Thibault wrote: Ooh, I see. Now it's obvious, now that it's not coming from the tcb loop:) Could you try the attached patch? It looks like it now goes into an infinite loop when a connection is closed. Packer output stopped here: ... 2016/11/12 09:29:04 ui:

Re: [Qemu-devel] Crashing in tcp_close

2016-11-11 Thread Samuel Thibault
Brian Candler, on Fri 11 Nov 2016 20:53:12 +, wrote: > On 11/11/2016 16:17, Samuel Thibault wrote: > >Could you increase the value given to valgrind's --num-callers= so we > >can make sure the context of this call? > > OK: re-run with --num-callers=250. It took a few iterations, but I captured

Re: [Qemu-devel] Crashing in tcp_close

2016-11-11 Thread Brian Candler
On 11/11/2016 16:17, Samuel Thibault wrote: Could you increase the value given to valgrind's --num-callers= so we can make sure the context of this call? OK: re-run with --num-callers=250. It took a few iterations, but I captured it again. (I have grepped out all the "invalid file descriptor"

Re: [Qemu-devel] Crashing in tcp_close

2016-11-11 Thread Samuel Thibault
Hello, Brian Candler, on Fri 11 Nov 2016 16:02:44 +, wrote: > Aha!! Looking carefully at valgrind output, I see some definite cases of > use-after-free in tcp_output. Does the info below help? Ok, that's interesting. I however still don't see how that could happen :) > ==18350== Invalid read

Re: [Qemu-devel] Crashing in tcp_close

2016-11-11 Thread Brian Candler
On 11/11/2016 15:02, Brian Candler wrote: But over more than 10 runs (some with MALLOC_xxx_ and some without) it did not crash once :-( Aha!! Looking carefully at valgrind output, I see some definite cases of use-after-free in tcp_output. Does the info below help? Regards, Brian. ==18350==

Re: [Qemu-devel] Crashing in tcp_close

2016-11-11 Thread Brian Candler
On 09/11/2016 11:27, Stefan Hajnoczi wrote: Heap corruption. Valgrind's memcheck tool could be fruitful here: http://valgrind.org/docs/manual/quick-start.html#quick-start.mcrun This is really frustrating. I have been running with the following script instead of invoking qemu directly: $ ca

Re: [Qemu-devel] Crashing in tcp_close

2016-11-09 Thread Stefan Hajnoczi
On Tue, Nov 08, 2016 at 09:22:25PM +, Brian Candler wrote: > On 07/11/2016 10:42, Stefan Hajnoczi wrote: > > On Mon, Nov 07, 2016 at 08:42:17AM +, Brian Candler wrote: > > > >On 06/11/2016 18:04, Samuel Thibault wrote: > > > > > >Brian, could you run it with > > > > > > > > > > > >export MA

Re: [Qemu-devel] Crashing in tcp_close

2016-11-08 Thread Brian Candler
On 07/11/2016 10:42, Stefan Hajnoczi wrote: On Mon, Nov 07, 2016 at 08:42:17AM +, Brian Candler wrote: >On 06/11/2016 18:04, Samuel Thibault wrote: > >Brian, could you run it with > > > >export MALLOC_CHECK_=2 > > > >and also this could be useful: > > > >export MALLOC_PERTURB_=1234 > > > >A

Re: [Qemu-devel] Crashing in tcp_close

2016-11-08 Thread Brian Candler
On 07/11/2016 20:52, Brian Candler wrote: So either this means that using tap networking instead of user networking is fixing all the problems; or it is some other option which is different. Really I now need to run qemu with exactly the same settings as before, except with tap instead of user

Re: [Qemu-devel] Crashing in tcp_close

2016-11-08 Thread Stefan Hajnoczi
On Mon, Nov 07, 2016 at 08:52:20PM +, Brian Candler wrote: > Question: is "-enable-kvm" the same as "-machine pc-x,accel=kvm", or do > both need to be specified? I notice that packer wasn't giving both options, > but libvirt is. No, -enable-kvm is not the same as -machine pc-x,accel=kv

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Brian Candler
On 07/11/2016 11:09, Brian Candler wrote: On 07/11/2016 10:42, Stefan Hajnoczi wrote: Let's try to isolate the cause of this crash: Are you able to switch -netdev user to -netdev tap so we can rule out the slirp user network stack as the source of memory corruption? Let me try to set that up. U

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Stefan Hajnoczi
On Mon, Nov 07, 2016 at 11:09:10AM +, Brian Candler wrote: > On 07/11/2016 10:42, Stefan Hajnoczi wrote: > > Let's try to isolate the cause of this crash: > > > > Are you able to switch -netdev user to -netdev tap so we can rule out > > the slirp user network stack as the source of memory corr

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Brian Candler
On 07/11/2016 10:42, Stefan Hajnoczi wrote: Let's try to isolate the cause of this crash: Are you able to switch -netdev user to -netdev tap so we can rule out the slirp user network stack as the source of memory corruption? Let me try to set that up. Using packer.io, I will have to start a VM b

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Stefan Hajnoczi
On Mon, Nov 07, 2016 at 08:42:17AM +, Brian Candler wrote: > On 06/11/2016 18:04, Samuel Thibault wrote: > > Brian, could you run it with > > > > export MALLOC_CHECK_=2 > > > > and also this could be useful: > > > > export MALLOC_PERTURB_=1234 > > > > Also, to rule out the double-free scena

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Brian Candler
On 07/11/2016 08:42, Brian Candler wrote: The following crashes occurred when running with a single vcpu. Normally I have been running with -smp 8,sockets=1,cores=4,threads=2 as it seems to crash less with those settings; however I'm trying it again like that in a loop to see if I can get a cra

Re: [Qemu-devel] Crashing in tcp_close

2016-11-07 Thread Brian Candler
On 06/11/2016 18:04, Samuel Thibault wrote: Brian, could you run it with export MALLOC_CHECK_=2 and also this could be useful: export MALLOC_PERTURB_=1234 Also, to rule out the double-free scenario, and try to catch a buffer overflow coming from the socket structure itself, I have attached a

Re: [Qemu-devel] Crashing in tcp_close

2016-11-06 Thread Samuel Thibault
Hello, Stefan Hajnoczi, on Fri 04 Nov 2016 11:14:19 +, wrote: > CCing slirp maintainers to get attention on this bug Thanks! > > Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. > > 0x76a1bb5b in _int_free (av=0x76d5fb20 , > > p=, have_lock=0) at malloc.c:4

Re: [Qemu-devel] Crashing in tcp_close

2016-11-04 Thread Stefan Hajnoczi
On Thu, Oct 20, 2016 at 10:53:50PM +0100, Brian Candler wrote: CCing slirp maintainers to get attention on this bug > I have some reproducible-ish segfaults in qemu 2.7.0 (built from source) > running under ubuntu 16.04, on a quad-core i7 Mac Mini Server. > > I can reproduce these problems on a

[Qemu-devel] Crashing in tcp_close

2016-10-20 Thread Brian Candler
I have some reproducible-ish segfaults in qemu 2.7.0 (built from source) running under ubuntu 16.04, on a quad-core i7 Mac Mini Server. I can reproduce these problems on a different Mac Mini, and I also replaced the RAM on mine, so I'm sure it's not hardware related. It's somewhat painful to