Public bug reported: We see UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6 we saw the following trace during traffic in the regression:
[12885.292500] UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6 [12885.296358] signed integer overflow: [12885.300100] 358869104 * 100 cannot be represented in type 'int' [12885.304001] CPU: 2 PID: 19630 Comm: sock_stream_tes Tainted: G OE 4.15.0-rc8-for-upstream-dbg-2018-01-25_19-31-23-61 #1 [12885.311856] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014 [12885.316091] Call Trace: [12885.320234] <IRQ> [12885.324366] dump_stack+0xd1/0x159 [12885.328586] ? dma_virt_map_sg+0x147/0x147 [12885.332804] ? val_to_string.constprop.4+0x88/0xd1 [12885.337055] ubsan_epilogue+0x9/0x49 [12885.341345] handle_overflow+0x15e/0x189 [12885.345636] ? __ubsan_handle_negate_overflow+0x108/0x108 [12885.349891] ? kvm_clock_read+0x1f/0x30 [12885.354230] ? ktime_get+0x18d/0x280 [12885.358654] ? getrawmonotonic64+0x320/0x320 [12885.363116] ? mark_lock+0x1cf/0xc50 [12885.367624] ? inet_recvmsg+0x121/0x4a0 [12885.372114] mlx5e_napi_poll+0x1199/0x15c0 [mlx5_core] [12885.376774] ? mlx5e_rx_dim_work+0x160/0x160 [mlx5_core] [12885.381406] ? print_irqtrace_events+0x120/0x120 [12885.385907] ? mark_held_locks+0x93/0x100 [12885.392099] ? print_irqtrace_events+0x120/0x120 [12885.396589] ? trace_hardirqs_on_caller+0x206/0x390 [12885.401278] ? kasan_slab_free+0x87/0xc0 [12885.406000] ? pvclock_clocksource_read+0x146/0x280 [12885.410608] ? mark_held_locks+0x71/0x100 [12885.415251] net_rx_action+0x58c/0x10a0 [12885.419873] ? napi_complete_done+0x3d0/0x3d0 [12885.424385] ? check_chain_key+0x150/0x260 [12885.428784] ? debug_check_no_locks_freed+0x200/0x200 [12885.433041] ? match_held_lock+0x8a/0x4f0 [12885.437215] ? match_held_lock+0x8a/0x4f0 [12885.441249] ? lock_downgrade+0x3e0/0x3e0 [12885.445151] ? do_raw_spin_unlock+0x14d/0x230 [12885.448970] ? save_trace+0x1f0/0x1f0 [12885.452664] ? save_trace+0x1f0/0x1f0 [12885.456224] ? match_held_lock+0xa2/0x4f0 [12885.459668] ? pvclock_clocksource_read+0x146/0x280 [12885.463085] ? save_trace+0x1f0/0x1f0 [12885.466361] ? preempt_count_sub+0x14/0xd0 [12885.469566] ? __lock_is_held+0x5d/0x110 [12885.472665] ? preempt_count_sub+0x14/0xd0 [12885.475653] ? __lock_is_held+0x5d/0x110 [12885.478529] ? mark_lock+0x1cf/0xc50 [12885.481276] ? match_held_lock+0xa2/0x4f0 [12885.483984] ? print_irqtrace_events+0x120/0x120 [12885.486679] ? save_trace+0x1f0/0x1f0 [12885.490891] ? irq_exit+0x150/0x150 [12885.493454] ? __napi_schedule+0x1ae/0x220 [12885.495936] ? netdev_master_upper_dev_link+0x20/0x20 [12885.498402] ? check_chain_key+0x150/0x260 [12885.500774] ? __tasklet_schedule+0x22/0xf0 [12885.503086] ? match_held_lock+0xa2/0x4f0 [12885.505431] ? mlx5_eq_int+0x821/0xb50 [mlx5_core] [12885.507775] ? save_trace+0x1f0/0x1f0 [12885.510082] ? pvclock_clocksource_read+0x146/0x280 [12885.512416] ? pvclock_read_flags+0x80/0x80 [12885.514705] ? save_trace+0x1f0/0x1f0 [12885.516995] ? __handle_irq_event_percpu+0x1b0/0x800 [12885.519305] ? __lock_is_held+0x5d/0x110 [12885.521630] __do_softirq+0x248/0xba9 [12885.523913] ? __irqentry_text_end+0x1f8a70/0x1f8a70 [12885.526234] ? pvclock_clocksource_read+0x146/0x280 [12885.528563] ? pvclock_read_flags+0x80/0x80 [12885.530843] ? do_raw_spin_trylock+0x120/0x120 [12885.533178] ? kvm_clock_read+0x1f/0x30 [12885.535432] ? kvm_sched_clock_read+0x5/0x10 [12885.537702] ? sched_clock_cpu+0x14/0x1f0 [12885.539968] irq_exit+0xf4/0x150 [12885.542186] do_IRQ+0xe8/0x1e0 [12885.544390] common_interrupt+0xa2/0xa2 [12885.546607] </IRQ> There is int overflow in: include/linux/net_dim.h #define IS_SIGNIFICANT_DIFF(val, ref) \ (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */ The include/linux/net_dim.h library in new in kernel 4.16, in 4.15 kernel this code was in drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c The upstream fix that fix this issue is commit f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33 Author: Tal Gilboa <ta...@mellanox.com> Date: Thu Mar 29 13:53:52 2018 +0300 net/dim: Fix int overflow When calculating difference between samples, the values are multiplied by 100. Large values may cause int overflow when multiplied (usually on first iteration). Fixed by forcing 100 to be of type unsigned long. Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux") Signed-off-by: Tal Gilboa <ta...@mellanox.com> Reviewed-by: Andy Gospodarek <go...@broadcom.com> Signed-off-by: David S. Miller <da...@davemloft.net> diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h index bebeaad..29ed8fd 100644 --- a/include/linux/net_dim.h +++ b/include/linux/net_dim.h @@ -231,7 +231,7 @@ static inline void net_dim_exit_parking(struct net_dim *dim) } #define IS_SIGNIFICANT_DIFF(val, ref) \ - (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */ + (((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */ static inline int net_dim_stats_compare(struct net_dim_stats *curr, struct net_dim_stats *prev) Will sent a patch to Ubuntu kernel mailing list with a backported patch to the old location ** Affects: linux (Ubuntu) Importance: Undecided Status: Incomplete ** Tags: bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1763269 Title: Mellanox [mlx5] [bionic] UBSAN: Undefined behaviour in ./include/linux/net_dim.h Status in linux package in Ubuntu: Incomplete Bug description: We see UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6 we saw the following trace during traffic in the regression: [12885.292500] UBSAN: Undefined behaviour in ./include/linux/net_dim.h:243:6 [12885.296358] signed integer overflow: [12885.300100] 358869104 * 100 cannot be represented in type 'int' [12885.304001] CPU: 2 PID: 19630 Comm: sock_stream_tes Tainted: G OE 4.15.0-rc8-for-upstream-dbg-2018-01-25_19-31-23-61 #1 [12885.311856] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014 [12885.316091] Call Trace: [12885.320234] <IRQ> [12885.324366] dump_stack+0xd1/0x159 [12885.328586] ? dma_virt_map_sg+0x147/0x147 [12885.332804] ? val_to_string.constprop.4+0x88/0xd1 [12885.337055] ubsan_epilogue+0x9/0x49 [12885.341345] handle_overflow+0x15e/0x189 [12885.345636] ? __ubsan_handle_negate_overflow+0x108/0x108 [12885.349891] ? kvm_clock_read+0x1f/0x30 [12885.354230] ? ktime_get+0x18d/0x280 [12885.358654] ? getrawmonotonic64+0x320/0x320 [12885.363116] ? mark_lock+0x1cf/0xc50 [12885.367624] ? inet_recvmsg+0x121/0x4a0 [12885.372114] mlx5e_napi_poll+0x1199/0x15c0 [mlx5_core] [12885.376774] ? mlx5e_rx_dim_work+0x160/0x160 [mlx5_core] [12885.381406] ? print_irqtrace_events+0x120/0x120 [12885.385907] ? mark_held_locks+0x93/0x100 [12885.392099] ? print_irqtrace_events+0x120/0x120 [12885.396589] ? trace_hardirqs_on_caller+0x206/0x390 [12885.401278] ? kasan_slab_free+0x87/0xc0 [12885.406000] ? pvclock_clocksource_read+0x146/0x280 [12885.410608] ? mark_held_locks+0x71/0x100 [12885.415251] net_rx_action+0x58c/0x10a0 [12885.419873] ? napi_complete_done+0x3d0/0x3d0 [12885.424385] ? check_chain_key+0x150/0x260 [12885.428784] ? debug_check_no_locks_freed+0x200/0x200 [12885.433041] ? match_held_lock+0x8a/0x4f0 [12885.437215] ? match_held_lock+0x8a/0x4f0 [12885.441249] ? lock_downgrade+0x3e0/0x3e0 [12885.445151] ? do_raw_spin_unlock+0x14d/0x230 [12885.448970] ? save_trace+0x1f0/0x1f0 [12885.452664] ? save_trace+0x1f0/0x1f0 [12885.456224] ? match_held_lock+0xa2/0x4f0 [12885.459668] ? pvclock_clocksource_read+0x146/0x280 [12885.463085] ? save_trace+0x1f0/0x1f0 [12885.466361] ? preempt_count_sub+0x14/0xd0 [12885.469566] ? __lock_is_held+0x5d/0x110 [12885.472665] ? preempt_count_sub+0x14/0xd0 [12885.475653] ? __lock_is_held+0x5d/0x110 [12885.478529] ? mark_lock+0x1cf/0xc50 [12885.481276] ? match_held_lock+0xa2/0x4f0 [12885.483984] ? print_irqtrace_events+0x120/0x120 [12885.486679] ? save_trace+0x1f0/0x1f0 [12885.490891] ? irq_exit+0x150/0x150 [12885.493454] ? __napi_schedule+0x1ae/0x220 [12885.495936] ? netdev_master_upper_dev_link+0x20/0x20 [12885.498402] ? check_chain_key+0x150/0x260 [12885.500774] ? __tasklet_schedule+0x22/0xf0 [12885.503086] ? match_held_lock+0xa2/0x4f0 [12885.505431] ? mlx5_eq_int+0x821/0xb50 [mlx5_core] [12885.507775] ? save_trace+0x1f0/0x1f0 [12885.510082] ? pvclock_clocksource_read+0x146/0x280 [12885.512416] ? pvclock_read_flags+0x80/0x80 [12885.514705] ? save_trace+0x1f0/0x1f0 [12885.516995] ? __handle_irq_event_percpu+0x1b0/0x800 [12885.519305] ? __lock_is_held+0x5d/0x110 [12885.521630] __do_softirq+0x248/0xba9 [12885.523913] ? __irqentry_text_end+0x1f8a70/0x1f8a70 [12885.526234] ? pvclock_clocksource_read+0x146/0x280 [12885.528563] ? pvclock_read_flags+0x80/0x80 [12885.530843] ? do_raw_spin_trylock+0x120/0x120 [12885.533178] ? kvm_clock_read+0x1f/0x30 [12885.535432] ? kvm_sched_clock_read+0x5/0x10 [12885.537702] ? sched_clock_cpu+0x14/0x1f0 [12885.539968] irq_exit+0xf4/0x150 [12885.542186] do_IRQ+0xe8/0x1e0 [12885.544390] common_interrupt+0xa2/0xa2 [12885.546607] </IRQ> There is int overflow in: include/linux/net_dim.h #define IS_SIGNIFICANT_DIFF(val, ref) \ (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */ The include/linux/net_dim.h library in new in kernel 4.16, in 4.15 kernel this code was in drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c The upstream fix that fix this issue is commit f97c3dc3c0e8d23a5c4357d182afeef4c67f5c33 Author: Tal Gilboa <ta...@mellanox.com> Date: Thu Mar 29 13:53:52 2018 +0300 net/dim: Fix int overflow When calculating difference between samples, the values are multiplied by 100. Large values may cause int overflow when multiplied (usually on first iteration). Fixed by forcing 100 to be of type unsigned long. Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux") Signed-off-by: Tal Gilboa <ta...@mellanox.com> Reviewed-by: Andy Gospodarek <go...@broadcom.com> Signed-off-by: David S. Miller <da...@davemloft.net> diff --git a/include/linux/net_dim.h b/include/linux/net_dim.h index bebeaad..29ed8fd 100644 --- a/include/linux/net_dim.h +++ b/include/linux/net_dim.h @@ -231,7 +231,7 @@ static inline void net_dim_exit_parking(struct net_dim *dim) } #define IS_SIGNIFICANT_DIFF(val, ref) \ - (((100 * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */ + (((100UL * abs((val) - (ref))) / (ref)) > 10) /* more than 10% difference */ static inline int net_dim_stats_compare(struct net_dim_stats *curr, struct net_dim_stats *prev) Will sent a patch to Ubuntu kernel mailing list with a backported patch to the old location To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763269/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp