On 07/18/2015 11:01 AM, Johan Schuijt wrote: > Yes, we already found these and are included in our kernel, but even with > these patches we still receive the panic. > > - Johan > > >> On 18 Jul 2015, at 10:56, Eric Dumazet <eric.duma...@gmail.com> wrote: >> >> On Fri, 2015-07-17 at 21:18 +0000, Johan Schuijt wrote: >>> Hey guys, >>> >>> >>> We’re currently running into a reproducible panic in the eviction work >>> queue code when we pin al our eth* IRQ to different CPU cores (in >>> order to scale our networking performance for our virtual servers). >>> This only occurs in kernels >= 3.17 and is a result of the following >>> change: >>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 >>> >>> >>> The race/panic we see seems to be the same as, or similar to: >>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29 >>> >>> >>> We can confirm that this is directly exposed by the IRQ pinning since >>> disabling this stops us from being able to reproduce this case :) >>> >>> >>> How te reproduce: in our test-setup we have 4 machines generating UDP >>> packets which are send to the vulnerable host. These all have a MTU of >>> 100 (for test purposes) and send UDP packets of a size of 256 bytes. >>> Within half an hour you will see the following panic: >>> >>> >>> crash> bt >>> PID: 56 TASK: ffff885f3d9fc210 CPU: 9 COMMAND: "kworker/9:0" >>> #0 [ffff885f3da03b60] machine_kexec at ffffffff8104a1f7 >>> #1 [ffff885f3da03bb0] crash_kexec at ffffffff810db187 >>> #2 [ffff885f3da03c80] oops_end at ffffffff81015140 >>> #3 [ffff885f3da03ca0] general_protection at ffffffff814f6c88 >>> [exception RIP: inet_evict_bucket+281] >>> RIP: ffffffff81480699 RSP: ffff885f3da03d58 RFLAGS: 00010292 >>> RAX: ffff885f3da03d08 RBX: dead0000001000a8 RCX: >>> ffff885f3da03d08 >>> RDX: 0000000000000006 RSI: ffff885f3da03ce8 RDI: >>> dead0000001000a8 >>> RBP: 0000000000000002 R8: 0000000000000286 R9: >>> ffff88302f401640 >>> R10: 0000000080000000 R11: ffff88602ec0c138 R12: >>> ffffffff81a8d8c0 >>> R13: ffff885f3da03d70 R14: 0000000000000000 R15: >>> ffff881d6efe1a00 >>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >>> #4 [ffff885f3da03db0] inet_frag_worker at ffffffff8148075a >>> #5 [ffff885f3da03e10] process_one_work at ffffffff8107be19 >>> #6 [ffff885f3da03e60] worker_thread at ffffffff8107c6e3 >>> #7 [ffff885f3da03ed0] kthread at ffffffff8108103e >>> #8 [ffff885f3da03f50] ret_from_fork at ffffffff814f4d7c >>> >>> >>> We would love to receive your input on this matter. >>> >>> >>> Thx in advance, >>> >>> >>> - Johan >> >> Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 & >> d70127e8a942364de8dd140fe73893efda363293 >> >> Also please send your mails in text format, not html, and CC netdev ( I >> did here) >> >>> >>> >> >> > > N�����r��y���b�X��ǧv�^�){.n�+���z�^�)���w*jg��������ݢj/���z�ޖ��2�ޙ���&�)ߡ�a�����G���h��j:+v���w�٥ >
Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Also could you test with a clean current kernel from Linus' tree or Dave's -net ? These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html