From: Di Zhu <zhud...@huawei.com>

I use the similar test method described in link below with KASAN enabled:
https://lore.kernel.org/netdev/4c5e467e07fb410ab4135b391d663...@huawei.com/
soon after, KASAN reports:
[ 9041.977110] 
==================================================================
[ 9041.977151] BUG: KASAN: use-after-free in 
bond_3ad_state_machine_handler+0x1c34/0x20b0 [bonding]
[ 9041.977156] Read of size 2 at addr ffff80394b8d70b0 by task 
kworker/u192:2/78492

[ 9041.977187] Workqueue: bond0 bond_3ad_state_machine_handler [bonding]
[ 9041.977190] Call trace:
[ 9041.977197]  dump_backtrace+0x0/0x310
[ 9041.977201]  show_stack+0x28/0x38
[ 9041.977207]  dump_stack+0xec/0x15c
[ 9041.977213]  print_address_description+0x68/0x2d0
[ 9041.977217]  kasan_report+0x130/0x2f0
[ 9041.977221]  __asan_load2+0x80/0xa8
[ 9041.977238]  bond_3ad_state_machine_handler+0x1c34/0x20b0 [bonding]

[ 9041.977261] Allocated by task 138336:
[ 9041.977266]  kasan_kmalloc+0xe0/0x190
[ 9041.977271]  kmem_cache_alloc_trace+0x1d8/0x468
[ 9041.977288]  bond_enslave+0x514/0x2160 [bonding]
[ 9041.977305]  bond_option_slaves_set+0x188/0x2c8 [bonding]
[ 9041.977323]  __bond_opt_set+0x1b0/0x740 [bonding]

[ 9041.977420] Freed by task 105873:
[ 9041.977425]  __kasan_slab_free+0x120/0x228
[ 9041.977429]  kasan_slab_free+0x10/0x18
[ 9041.977432]  kfree+0x90/0x468
[ 9041.977448]  slave_kobj_release+0x7c/0x98 [bonding]
[ 9041.977452]  kobject_put+0x118/0x328
[ 9041.977468]  __bond_release_one+0x688/0xa08 [bonding]
[ 9041.977660]  pci_device_remove+0x80/0x198

The root cause is that in bond_3ad_unbind_slave() the last step is
detach the port from aggregator including it. if find this aggregator
and it has not any active ports, it will call ad_clear_agg() to do clear
things, especially set aggregator->lag_ports = NULL.

But ports in aggregator->lag_ports list which is set to NULL previously
still has pointer to this aggregator through  port->aggregator, event after
this aggregator has released.

The use-after-free problem will cause some puzzling situactions,
i am not sure whether fix this problem can solve all the problems mentioned
by the link described earlier, but it did solve all problems i encountered.

Signed-off-by: Di Zhu <zhud...@huawei.com>
---
 drivers/net/bonding/bond_3ad.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 6908822d9773..5d5a903e899c 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1793,6 +1793,8 @@ static void ad_agg_selection_logic(struct aggregator *agg,
 static void ad_clear_agg(struct aggregator *aggregator)
 {
        if (aggregator) {
+               struct port *port;
+
                aggregator->is_individual = false;
                aggregator->actor_admin_aggregator_key = 0;
                aggregator->actor_oper_aggregator_key = 0;
@@ -1801,6 +1803,10 @@ static void ad_clear_agg(struct aggregator *aggregator)
                aggregator->partner_oper_aggregator_key = 0;
                aggregator->receive_state = 0;
                aggregator->transmit_state = 0;
+               for (port = aggregator->lag_ports; port;
+                               port = port->next_port_in_aggregator)
+                       if (port->aggregator == aggregator)
+                               port->aggregator = NULL;
                aggregator->lag_ports = NULL;
                aggregator->is_active = 0;
                aggregator->num_of_ports = 0;
-- 
2.23.0

Reply via email to