On Thu, Mar 09, 2017 at 10:13:10AM +0800, Ye Xiaolong wrote:
On 03/02, Borislav Petkov wrote:
Hi,

On Thu, Mar 02, 2017 at 09:09:34AM +0800, kernel test robot wrote:

FYI, we noticed the following commit:

commit: ed3ce2a9172457ef7dbaa9f964e63dfde2bdcb5f ("x86: Optimize clear_page()")
url: 
https://github.com/0day-ci/linux/commits/Borislav-Petkov/x86-Optimize-clear_page/20170215-193441


in testcase: will-it-scale
with following parameters:

        test: poll2
        cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to 
n parallel copies to see if the testcase will scale. It builds both a process 
and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

thanks for the report, I was able to reproduce.

BUT(!) this report is misleading because it talks about will-it-scale
but your splat happens when you kexec the kernel:

 [  336.340747] LKP: kexec loading...
 [  336.340852]
 [  336.343323] kexec --noefi -l 
/tmp/cache/pkg/linux/x86_64-rhel-7.2/gcc-6/ed3ce2a9172457ef7dbaa9f964e63dfde2bdcb5f/vmlinuz-4.9.0-rc6-00134-ged3ce2a
 --initrd=/tmp/cache/initrd-concatenated
 [  336.343758]
 [  337.893471] --append=ip=::::lkp-ivb-d01::dhcp root=/dev/ram0 user=lkp 
job=/lkp/scheduled/lkp-ivb-d01/will-it-scale-poll2-performance-debian-x86_64-2016-08-31.cgz-ed3ce2a9172457ef7dbaa9f964e63dfde2bdcb5f-20170301-28072-1dqjyhl-11.yaml
 ARCH=x86_64 kconfig=x86_64-rhel-7.2 branch=linux-devel/devel-hourly-2017022612 
commit=ed3ce2a9172457ef7dbaa9f964e63dfde2bdcb5f 
BOOT_IMAGE=/pkg/linux/x86_64-rhel-7.2/gcc-6/ed3ce2a9172457ef7dbaa9f964e63dfde2bdcb5f/vmlinuz-4.9.0-rc6-00134-ged3ce2a
 max_uptime=1500 
RESULT_ROOT=/result/will-it-scale/poll2-performance/lkp-ivb-d01/debian-x86_64-2016-08-31.cgz/x86_64-rhel-7.2/gcc-6/ed3ce2a9172457ef7dbaa9f964e63dfde2bdcb5f/11
 LKP_SERVER=inn debug apic=debug sysrq_always_enabled 
rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 
softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 
prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_
 [  337.895521]
 [  339.467661] BUG: unable to handle kernel paging request at ffff8803cf2e2008
 [  339.468000] IP: [<ffffffff81061e71>] native_set_pmd+0x1/0x10
 ...


Maybe Fengguang has an idea what to do here, maybe something like add
markers to the log to denote where the test environment is prepared and
when the actual test starts. Then grep for those and generate the report
based on that...

Thanks for the suggestions, we'll keep improving the reports to avoid confusion
or misleading.

One possible improvement is to provide "lkp qemu" reproduce steps for
kernel oops -- it would be way more convenient and safe to follow than
"lkp run", since the later risks hang the physical machine.

As for the test description, the dmesg carries markers for the user
space test start/stop points, so the robot can easily tell whether the
oops happen during the test or before/after the test -- the latter may
well (but not always) indicate the oops is not relevant to the testcase,
but to the regular kernel boot/reboot/kexec process.

Thanks,
Fengguang

Reply via email to