[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-master-arm-next-allmodconfig - Build # 36 - Successful!

2021-08-12 Thread ci_notify
Successfully identified regression in *linux* in CI configuration 
tcwg_kernel/gnu-master-arm-next-allmodconfig.  So far, this commit has 
regressed CI configurations:
 - tcwg_kernel/gnu-master-arm-next-allmodconfig

Culprit:

commit 4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15
Author: Eric Dumazet 
Date:   Tue Aug 10 02:45:47 2021 -0700

net: igmp: fix data-race in igmp_ifc_timer_expire()

Fix the data-race reported by syzbot [1]
Issue here is that igmp_ifc_timer_expire() can update in_dev->mr_ifc_count
while another change just occured from another context.

in_dev->mr_ifc_count is only 8bit wide, so the race had little
consequences.

[1]
BUG: KCSAN: data-race in igmp_ifc_event / igmp_ifc_timer_expire

write to 0x8881051e3062 of 1 bytes by task 12547 on cpu 0:
 igmp_ifc_event+0x1d5/0x290 net/ipv4/igmp.c:821
 igmp_group_added+0x462/0x490 net/ipv4/igmp.c:1356
 ip_mc_inc_group+0x3ff/0x500 net/ipv4/igmp.c:1461
 __ip_mc_join_group+0x24d/0x2c0 net/ipv4/igmp.c:2199
 ip_mc_join_group_ssm+0x20/0x30 net/ipv4/igmp.c:2218
 do_ip_setsockopt net/ipv4/ip_sockglue.c:1285 [inline]
 ip_setsockopt+0x1827/0x2a80 net/ipv4/ip_sockglue.c:1423
 tcp_setsockopt+0x8c/0xa0 net/ipv4/tcp.c:3657
 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3362
 __sys_setsockopt+0x18f/0x200 net/socket.c:2159
 __do_sys_setsockopt net/socket.c:2170 [inline]
 __se_sys_setsockopt net/socket.c:2167 [inline]
 __x64_sys_setsockopt+0x62/0x70 net/socket.c:2167
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0x8881051e3062 of 1 bytes by interrupt on cpu 1:
 igmp_ifc_timer_expire+0x706/0xa30 net/ipv4/igmp.c:808
 call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1419
 expire_timers+0x135/0x250 kernel/time/timer.c:1464
 __run_timers+0x358/0x420 kernel/time/timer.c:1732
 run_timer_softirq+0x19/0x30 kernel/time/timer.c:1745
 __do_softirq+0x12c/0x26e kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu+0x9a/0xb0 kernel/softirq.c:636
 sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1100
 asm_sysvec_apic_timer_interrupt+0x12/0x20 
arch/x86/include/asm/idtentry.h:638
 console_unlock+0x8e8/0xb30 kernel/printk/printk.c:2646
 vprintk_emit+0x125/0x3d0 kernel/printk/printk.c:2174
 vprintk_default+0x22/0x30 kernel/printk/printk.c:2185
 vprintk+0x15a/0x170 kernel/printk/printk_safe.c:392
 printk+0x62/0x87 kernel/printk/printk.c:2216
 selinux_netlink_send+0x399/0x400 security/selinux/hooks.c:6041
 security_netlink_send+0x42/0x90 security/security.c:2070
 netlink_sendmsg+0x59e/0x7c0 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:703 [inline]
 sock_sendmsg net/socket.c:723 [inline]
 sys_sendmsg+0x360/0x4d0 net/socket.c:2392
 ___sys_sendmsg net/socket.c:2446 [inline]
 __sys_sendmsg+0x1ed/0x270 net/socket.c:2475
 __do_sys_sendmsg net/socket.c:2484 [inline]
 __se_sys_sendmsg net/socket.c:2482 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x01 -> 0x02

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 12539 Comm: syz-executor.1 Not tainted 5.14.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
Google 01/01/2011

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Signed-off-by: David S. Miller 


Results regressed to (for first_bad == 4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
21598
# First few build errors in logs:
# 00:32:42 igmp.c:(.text+0xa734): undefined reference to `__bad_cmpxchg'
# 00:32:42 make: *** [Makefile:1176: vmlinux] Error 1

from (for last_good == 37c86c4a0bfc2faaf0ed959db9de814c85797f09)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
29650
# linux build successful:
all

Artifacts of last_good build: 
https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodconfig/36/artifact/artifacts/build-37c86c4a0bfc2faaf0ed959db9de814c85797f09/
Artifacts of first_bad build: 
https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodconfig/36/artifact/artifacts/build-4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15/
Build top page/logs: 
https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodconfig/36/

Configuration details:
rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#9e723c5380c6e14fb91a8b6950563d040674afdb";

Reprodu

[CI-NOTIFY]: TCWG Bisect tcwg_bmk_tx1/llvm-master-aarch64-spec2k6-O3 - Build # 18 - Successful!

2021-08-12 Thread ci_notify
Successfully identified regression in *llvm* in CI configuration 
tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3.  So far, this commit has 
regressed CI configurations:
 - tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3

Culprit:

commit b4c0307d598004cfd96c770d2a4a84a37c838ba9
Author: Jon Roelofs 
Date:   Thu Aug 5 09:35:02 2021 -0700

Fix clang-interpreter build after 2487db1f286222e2501c2fa8e8244eda13f6afc3


Results regressed to (for first_bad == b4c0307d598004cfd96c770d2a4a84a37c838ba9)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 
artifacts/build-b4c0307d598004cfd96c770d2a4a84a37c838ba9/results_id:
1
# 470.lbm,lbm_base.default  regressed by 109

from (for last_good == bd17ced1db9a674fc8aa6632899e245672c7aa35)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1 -- --set gcc_override_configure=--disable-libsanitizer:
-8
# build_abe linux:
-7
# build_abe glibc:
-6
# build_abe stage2 -- --set gcc_override_configure=--disable-libsanitizer:
-5
# build_llvm true:
-3
# true:
0
# benchmark -- -O3 
artifacts/build-bd17ced1db9a674fc8aa6632899e245672c7aa35/results_id:
1

Artifacts of last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/18/artifact/artifacts/build-bd17ced1db9a674fc8aa6632899e245672c7aa35/
Results ID of last_good: 
tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/3351
Artifacts of first_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/18/artifact/artifacts/build-b4c0307d598004cfd96c770d2a4a84a37c838ba9/
Results ID of first_bad: 
tx1_64/tcwg_bmk_llvm_tx1/bisect-llvm-master-aarch64-spec2k6-O3/3314
Build top page/logs: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/18/

Configuration details:


Reproduce builds:

mkdir investigate-llvm-b4c0307d598004cfd96c770d2a4a84a37c838ba9
cd investigate-llvm-b4c0307d598004cfd96c770d2a4a84a37c838ba9

git clone https://git.linaro.org/toolchain/jenkins-scripts

mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/18/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/18/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/18/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /llvm/ ./ ./bisect/baseline/

cd llvm

# Reproduce first_bad build
git checkout --detach b4c0307d598004cfd96c770d2a4a84a37c838ba9
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach bd17ced1db9a674fc8aa6632899e245672c7aa35
../artifacts/test.sh

cd ..


History of pending regressions and results: 
https://git.linaro.org/toolchain/ci/base-artifacts.git/log/?h=linaro-local/ci/tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3

Artifacts: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/18/artifact/artifacts/
Build log: 
https://ci.linaro.org/job/tcwg_bmk_ci_llvm-bisect-tcwg_bmk_tx1-llvm-master-aarch64-spec2k6-O3/18/consoleText

Full commit (up to 1000 lines):

commit b4c0307d598004cfd96c770d2a4a84a37c838ba9
Author: Jon Roelofs 
Date:   Thu Aug 5 09:35:02 2021 -0700

Fix clang-interpreter build after 2487db1f286222e2501c2fa8e8244eda13f6afc3
---
 clang/examples/clang-interpreter/main.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/clang/examples/clang-interpreter/main.cpp 
b/clang/examples/clang-interpreter/main.cpp
index 342d42089472..a2c50167f6b1 100644
--- a/clang/examples/clang-interpreter/main.cpp
+++ b/clang/examples/clang-interpreter/main.cpp
@@ -66,7 +66,8 @@ private:
   SimpleJIT(
   std::unique_ptr TM, DataLayout DL,
   std::unique_ptr ProcessSymbolsGenerator)
-  : TM(std::move(TM)), DL(std::move(DL)) {
+  : ES(cantFail(SelfExecutorProcessControl::Create())), TM(std::move(TM)),
+DL(std::move(DL)) {
 llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
 MainJD.addGenerator(std::move(ProcessSymbolsGenerator));
   }

___
linaro-toolchain ma

Re: GDB aarch64 malfunctions w/Linaro / ARM gcc 10.3 compiler

2021-08-12 Thread Luis Machado

Hi,

On 8/9/21 1:44 PM, Dietmar May wrote:
I'm compiling and running a bare metal AArch64 bootloader using 3 
different compilers: the Linaro / ARM GCC 10.3.1 compiler, the Linaro / 
ARM GCC 10.2.1 compiler, and an in-house built GCC 10.2.0 compiler.


GDB will single step using the either of the GCC 10.2 compilers; but 
runs without halting when step is requested - or perhaps steps multiple 
instructions - when built using the Linaro / ARM-supplied GCC 10.3.1.


Could you please capture a remote log (I'm assuming GDB is connected 
remotely to OpenOCD) for the sessions with binaries built with 10.2 and 
10.3?


You can do so by entering "set remotelogfile " in 
GDB's console and then proceeding with the debugging session.


Enabling standard logging with "set logging on" would also be helpful.

Based on the logs, we can have a better idea of what's going on.

Regards,
Luis



Eclipse CDT (v4.20 aka 2021-06) is able to correlate debugging 
information from binaries built with either of the gcc 10.2 toolchains, 
and to single step correctly through the program. Breakpoints work as 
expected. Registers display fine.


Eclipse CDT is not able to correlate current PC location to source code 
using the binary built with Linaro / ARM 10.3, instead bringing up a 
disassembly window. Breakpoints placed at assembly instructions in the 
editor do not work.


I've tried three different GDB versions - ARM's supplied 10.2 and 10.3 
GDB, and the in-house built GDB. Results are the same.


The same makefile is used to create the binaries, with just a few macro 
definitions to switch. The only compiler flag of interest is 
-march=armv8.2-a (and of course -g -O0). -mtune=cortex-a53 doesn't help.


The board is connected via JTAG using OpenOCD 0.11.0+ and an Olimex 
ARM-USB-OCD-H adapter.


I'm building in a cygwin shell on Windows 10 version 21H1 using the 
compilers:


gcc-arm-10.3-2021.07-mingw-w64-i686-aarch64-none-elf.tar.xz
gcc-arm-10.2-2020.11-mingw-w64-i686-aarch64-none-elf.tar.xz

downloaded from:

https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-a/downloads 



Differences in compiler configuration (gcc -v) are:

Failing - Linaro / ARM GCC 10.3(.1):

--enable-checking=release
--target=aarch64-none-elf
--with-libiconv-prefix=/data/jenkins/workspace/GNU-toolchain/arm-10-4/build-mingw-aarch64-none-elf/host-tools 



Working - in house GCC 10.2.1:

--build=x86_64-w64-mingw32
--disable-libffi
--disable-libgomp
--disable-libmudflap
--disable-libssp
--disable-libstdcxx-pch
--disable-lto
--disable-win32-registry
--enable-multilib
--target=aarch64-elf
--with-gcc
--with-gnu-as
--with-gnu-ld
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm'
--with-multilib-list=lp64,ilp32
--with-stabs
--with-sysroot=/build/aarch64-elf_10.2.0/cross-gcc/aarch64-elf
--with-zstd=/build/aarch64-elf_10.2.0/host

Has anyone been able to perform hardware debugging of binaries built 
with the latest 10.3 builds using GDB (and maybe even Eclipse CDT)?


Any suggestions as to other steps to try?

Thanks.
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


[CI-NOTIFY]: TCWG Bisect tcwg_kernel/gnu-master-arm-next-allyesconfig - Build # 34 - Successful!

2021-08-12 Thread ci_notify
Successfully identified regression in *linux* in CI configuration 
tcwg_kernel/gnu-master-arm-next-allyesconfig.  So far, this commit has 
regressed CI configurations:
 - tcwg_kernel/gnu-master-arm-next-allyesconfig

Culprit:

commit 4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15
Author: Eric Dumazet 
Date:   Tue Aug 10 02:45:47 2021 -0700

net: igmp: fix data-race in igmp_ifc_timer_expire()

Fix the data-race reported by syzbot [1]
Issue here is that igmp_ifc_timer_expire() can update in_dev->mr_ifc_count
while another change just occured from another context.

in_dev->mr_ifc_count is only 8bit wide, so the race had little
consequences.

[1]
BUG: KCSAN: data-race in igmp_ifc_event / igmp_ifc_timer_expire

write to 0x8881051e3062 of 1 bytes by task 12547 on cpu 0:
 igmp_ifc_event+0x1d5/0x290 net/ipv4/igmp.c:821
 igmp_group_added+0x462/0x490 net/ipv4/igmp.c:1356
 ip_mc_inc_group+0x3ff/0x500 net/ipv4/igmp.c:1461
 __ip_mc_join_group+0x24d/0x2c0 net/ipv4/igmp.c:2199
 ip_mc_join_group_ssm+0x20/0x30 net/ipv4/igmp.c:2218
 do_ip_setsockopt net/ipv4/ip_sockglue.c:1285 [inline]
 ip_setsockopt+0x1827/0x2a80 net/ipv4/ip_sockglue.c:1423
 tcp_setsockopt+0x8c/0xa0 net/ipv4/tcp.c:3657
 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3362
 __sys_setsockopt+0x18f/0x200 net/socket.c:2159
 __do_sys_setsockopt net/socket.c:2170 [inline]
 __se_sys_setsockopt net/socket.c:2167 [inline]
 __x64_sys_setsockopt+0x62/0x70 net/socket.c:2167
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0x8881051e3062 of 1 bytes by interrupt on cpu 1:
 igmp_ifc_timer_expire+0x706/0xa30 net/ipv4/igmp.c:808
 call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1419
 expire_timers+0x135/0x250 kernel/time/timer.c:1464
 __run_timers+0x358/0x420 kernel/time/timer.c:1732
 run_timer_softirq+0x19/0x30 kernel/time/timer.c:1745
 __do_softirq+0x12c/0x26e kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu+0x9a/0xb0 kernel/softirq.c:636
 sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1100
 asm_sysvec_apic_timer_interrupt+0x12/0x20 
arch/x86/include/asm/idtentry.h:638
 console_unlock+0x8e8/0xb30 kernel/printk/printk.c:2646
 vprintk_emit+0x125/0x3d0 kernel/printk/printk.c:2174
 vprintk_default+0x22/0x30 kernel/printk/printk.c:2185
 vprintk+0x15a/0x170 kernel/printk/printk_safe.c:392
 printk+0x62/0x87 kernel/printk/printk.c:2216
 selinux_netlink_send+0x399/0x400 security/selinux/hooks.c:6041
 security_netlink_send+0x42/0x90 security/security.c:2070
 netlink_sendmsg+0x59e/0x7c0 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:703 [inline]
 sock_sendmsg net/socket.c:723 [inline]
 sys_sendmsg+0x360/0x4d0 net/socket.c:2392
 ___sys_sendmsg net/socket.c:2446 [inline]
 __sys_sendmsg+0x1ed/0x270 net/socket.c:2475
 __do_sys_sendmsg net/socket.c:2484 [inline]
 __se_sys_sendmsg net/socket.c:2482 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x01 -> 0x02

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 12539 Comm: syz-executor.1 Not tainted 5.14.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
Google 01/01/2011

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Signed-off-by: David S. Miller 


Results regressed to (for first_bad == 4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
19624
# First few build errors in logs:
# 00:49:46 igmp.c:(.text+0xa6f4): undefined reference to `__bad_cmpxchg'
# 00:49:48 make: *** [Makefile:1176: vmlinux] Error 1

from (for last_good == 37c86c4a0bfc2faaf0ed959db9de814c85797f09)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_abe stage1:
-5
# build_abe qemu:
-2
# linux_n_obj:
19709
# linux build successful:
all

Artifacts of last_good build: 
https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allyesconfig/34/artifact/artifacts/build-37c86c4a0bfc2faaf0ed959db9de814c85797f09/
Artifacts of first_bad build: 
https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allyesconfig/34/artifact/artifacts/build-4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15/
Build top page/logs: 
https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allyesconfig/34/

Configuration details:
rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#9e723c5380c6e14fb91a8b6950563d040674afdb";

Reprodu

[CI-NOTIFY]: TCWG Bisect tcwg_kernel/llvm-master-arm-next-allmodconfig - Build # 20 - Successful!

2021-08-12 Thread ci_notify
Successfully identified regression in *linux* in CI configuration 
tcwg_kernel/llvm-master-arm-next-allmodconfig.  So far, this commit has 
regressed CI configurations:
 - tcwg_kernel/llvm-master-arm-next-allmodconfig

Culprit:

commit 4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15
Author: Eric Dumazet 
Date:   Tue Aug 10 02:45:47 2021 -0700

net: igmp: fix data-race in igmp_ifc_timer_expire()

Fix the data-race reported by syzbot [1]
Issue here is that igmp_ifc_timer_expire() can update in_dev->mr_ifc_count
while another change just occured from another context.

in_dev->mr_ifc_count is only 8bit wide, so the race had little
consequences.

[1]
BUG: KCSAN: data-race in igmp_ifc_event / igmp_ifc_timer_expire

write to 0x8881051e3062 of 1 bytes by task 12547 on cpu 0:
 igmp_ifc_event+0x1d5/0x290 net/ipv4/igmp.c:821
 igmp_group_added+0x462/0x490 net/ipv4/igmp.c:1356
 ip_mc_inc_group+0x3ff/0x500 net/ipv4/igmp.c:1461
 __ip_mc_join_group+0x24d/0x2c0 net/ipv4/igmp.c:2199
 ip_mc_join_group_ssm+0x20/0x30 net/ipv4/igmp.c:2218
 do_ip_setsockopt net/ipv4/ip_sockglue.c:1285 [inline]
 ip_setsockopt+0x1827/0x2a80 net/ipv4/ip_sockglue.c:1423
 tcp_setsockopt+0x8c/0xa0 net/ipv4/tcp.c:3657
 sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3362
 __sys_setsockopt+0x18f/0x200 net/socket.c:2159
 __do_sys_setsockopt net/socket.c:2170 [inline]
 __se_sys_setsockopt net/socket.c:2167 [inline]
 __x64_sys_setsockopt+0x62/0x70 net/socket.c:2167
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0x8881051e3062 of 1 bytes by interrupt on cpu 1:
 igmp_ifc_timer_expire+0x706/0xa30 net/ipv4/igmp.c:808
 call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1419
 expire_timers+0x135/0x250 kernel/time/timer.c:1464
 __run_timers+0x358/0x420 kernel/time/timer.c:1732
 run_timer_softirq+0x19/0x30 kernel/time/timer.c:1745
 __do_softirq+0x12c/0x26e kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu+0x9a/0xb0 kernel/softirq.c:636
 sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1100
 asm_sysvec_apic_timer_interrupt+0x12/0x20 
arch/x86/include/asm/idtentry.h:638
 console_unlock+0x8e8/0xb30 kernel/printk/printk.c:2646
 vprintk_emit+0x125/0x3d0 kernel/printk/printk.c:2174
 vprintk_default+0x22/0x30 kernel/printk/printk.c:2185
 vprintk+0x15a/0x170 kernel/printk/printk_safe.c:392
 printk+0x62/0x87 kernel/printk/printk.c:2216
 selinux_netlink_send+0x399/0x400 security/selinux/hooks.c:6041
 security_netlink_send+0x42/0x90 security/security.c:2070
 netlink_sendmsg+0x59e/0x7c0 net/netlink/af_netlink.c:1919
 sock_sendmsg_nosec net/socket.c:703 [inline]
 sock_sendmsg net/socket.c:723 [inline]
 sys_sendmsg+0x360/0x4d0 net/socket.c:2392
 ___sys_sendmsg net/socket.c:2446 [inline]
 __sys_sendmsg+0x1ed/0x270 net/socket.c:2475
 __do_sys_sendmsg net/socket.c:2484 [inline]
 __se_sys_sendmsg net/socket.c:2482 [inline]
 __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x01 -> 0x02

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 12539 Comm: syz-executor.1 Not tainted 5.14.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
Google 01/01/2011

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Signed-off-by: David S. Miller 


Results regressed to (for first_bad == 4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
21692
# First few build errors in logs:
# 00:03:56 ld.lld: error: undefined symbol: __bad_cmpxchg
# 00:03:56 make: *** [Makefile:1176: vmlinux] Error 1

from (for last_good == 37c86c4a0bfc2faaf0ed959db9de814c85797f09)
# reset_artifacts:
-10
# build_abe binutils:
-9
# build_llvm:
-5
# build_abe qemu:
-2
# linux_n_obj:
29753
# linux build successful:
all

Artifacts of last_good build: 
https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-next-allmodconfig/20/artifact/artifacts/build-37c86c4a0bfc2faaf0ed959db9de814c85797f09/
Artifacts of first_bad build: 
https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-next-allmodconfig/20/artifact/artifacts/build-4a2b285e7e103d4d6c6ed3e5052a0ff74a5d7f15/
Build top page/logs: 
https://ci.linaro.org/job/tcwg_kernel-llvm-bisect-llvm-master-arm-next-allmodconfig/20/

Configuration details:
rr[linux_git]="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git#761c6d7ec820f123b931e7b8ef7ec7c8564e450f";

Reproduce builds:

mkdir i