Hi Ilia,

On 3/3/26 5:08 PM, Ilia Baikov wrote:
>> Just to be sure I understand. Is traffic from your workloads still
>> affected?
> 
> Unfortunately it still affects. As ARP is handled other way  (arp flow
> is dropped at OvS level) some of the instances has no public connectivity.
> 
> 
>> Double checking, are you sure you upgraded ovn-northd to use the version
>> from my branch?
> 
> 100% sure. I've builded packages from source, packed into deb packages
> and then builded kolla images using self-hosted repo with compiled deb
> packages.
> 

Thanks for checking!

> ansible all -i multinode -m shell -a "docker exec ovn_northd ovn-northd
> --version" --limit control
> 
> us-east-standard-1 | CHANGED | rc=0 >>
> ovn-northd 24.03.8
> Open vSwitch Library 3.3.8
> 
> us-east-4 | CHANGED | rc=0 >>
> ovn-northd 24.03.8
> Open vSwitch Library 3.3.8
> 

I have a hunch and I think I know what might be happening.  Is it
possible that your logical switches don't have
other_config:broadcast-arps-to-all-routers=false?

Also, we're currently flooding all ARP requests coming from the fabric
(entering OVN through the localnet port) to all logical switch ports =>
4K resubmit limit gets hit.

I'll update my test patch to cover this last part too while waiting for
your reply on the LS config question above.

Regards,
Dumitru

> 
> Regards,
> 
> Ilia Baikov
> [email protected]
> 
> 03.03.2026 18:54, Dumitru Ceara пишет:
>> On 3/3/26 3:54 PM, Ilia Baikov wrote:
>>> Hi,
>>>
>> Hi,
>>
>>
>>> Done, OVN components are now builded from your branch and deployed into
>>> production region where issue persist. To be more precise lets focus on
>>> a cluster that runs using L2 network setup, as this is the best field
>>> for testing and reproduction for this case and not breaking other
>>> stabilised regions which runs L3.
>>>
>>> ovn-controller 24.03.8
>>> Open vSwitch Library 3.3.8
>>> OpenFlow versions 0x6:0x6
>>> SB DB Schema 20.33.0
>>>
>>> OVN components are deployed ~ at 14:11, then resubmit exceptions
>>> appearing (24.03.2 just shows unrecognized op code (27) and so on).
>>>
>> Just to be sure I understand.  Is traffic from your workloads still
>> affected?
>>
>>> I've also enabled rconn/vconn dbg for ovn and ovs (but later than 14:11,
>>> but it seems vconn/rconn shows something useful for debugging).
>>>
>>> ovn logs starting from 14:11 - https://gist.githubusercontent.com/
>>> frct1/5f99221e1519d1552c8ef16a7ec8ee52/
>>> raw/147e9a171e538f9cd837008181272437b1c7ed37/ovn.log
>>> ovs logs starting from 14:11 - https://gist.githubusercontent.com/
>>> frct1/5f99221e1519d1552c8ef16a7ec8ee52/
>>> raw/147e9a171e538f9cd837008181272437b1c7ed37/ovs.log
>>>
>> Double checking, are you sure you upgraded ovn-northd to use the version
>> from my branch?
>>
>> I'm asking because this packet should not hit the mc_flood_l2 group
>> anymore:
>>
>> 2026-03-03T14:22:51.358Z|00012|ofproto_dpif_xlate(handler250)|WARN|
>> Dropped
>> 2244 log messages in last 60 seconds (most recently, 0 seconds ago) due
>> to excessive rate
>> 2026-03-03T14:22:51.358Z|00013|ofproto_dpif_xlate(handler250)|WARN|over
>> 4096 resubmit actions on bridge br-int while processing
>> arp,in_port=2409,vlan_tci=0x0000,dl_src=fa:16:3e:ba:70:84,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=166.1.160.225,arp_tpa=166.1.160.1,arp_op=1,arp_sha=fa:16:3e:ba:70:84,arp_tha=00:00:00:00:00:00
>>
>> The mc_flood_l2 group has 2k ports (all the VM ports) but my change
>> should change it to hit the mc_unknown group, which only has a handful
>> of ports as you said in your previous email.
>>
>>> I will keep it running 24.03.8 for easier debugging.
>>>
>> Thanks,
>> Dumitru
>>
>>> Regards,
>>>
>>> Ilia Baikov
>>> [email protected]
>>>
>>> 03.03.2026 15:55, Dumitru Ceara пишет:
>>>> On 3/2/26 7:24 PM, Ilia Baikov wrote:
>>>>> Hi Dumitru!
>>>>>
>>>> Hi Ilia,
>>>>
>>>>> ovn-nbctl --no-leader-only list Logical_switch_port | grep unknown |
>>>>> wc -l
>>>>> *10
>>>>>
>>>> OK, that's just a few (and I see on your other deployment too), so
>>>> that's great.
>>>>
>>>> Mind trying out this WIP patch for now and see if it works for you?
>>>>
>>>> https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-26.03
>>>> https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-25.09
>>>> https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-25.03
>>>> https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-24.09
>>>> https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-24.03
>>>>
>>>> They're all the same, just based on different stable branches, I wasn't
>>>> sure which one you'll need.
>>>>
>>>> Looking forward to hear your results.
>>>>
>>>> Thanks,
>>>> Dumitru
>>>>
>>>>
>>>>> *More output for lsp list (no filtering with ls uuid)*
>>>>>
>>>>> ovn-nbctl --no-leader-only list Logical_switch_port | grep unknown
>>>>> -A 5
>>>>> addresses           : [unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : true
>>>>> external_ids        : {"neutron:cidrs"="10.10.3.243/24",
>>>>> "neutron:device_id"="ba1a43e2-5496-4ced-8b8c-9b42c5ddd6f1",
>>>>> "neutron:device_owner"="network:floatingip_agent_gateway",
>>>>> "neutron:host_id"=us-east-standard-2, "neutron:mtu"="",
>>>>> "neutron:network_name"=neutron-bb8d0ef6-9b45-4398-86f3-51323a0db2cd,
>>>>> "neutron:port_capabilities"="", "neutron:port_name"="",
>>>>> "neutron:project_id"="", "neutron:revision_number"="5",
>>>>> "neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
>>>>> "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
>>>>> -- 
>>>>> addresses           : ["fa:16:3e:62:e5:5f 193.32.177.44", unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : true
>>>>> external_ids        : {"neutron:cidrs"="193.32.177.44/24",
>>>>> "neutron:device_id"="544cddd2-7a53-492a-8933-91fb97fd0546",
>>>>> "neutron:device_owner"="network:floatingip_agent_gateway",
>>>>> "neutron:host_id"=us-east-standard-1, "neutron:mtu"="",
>>>>> "neutron:network_name"=neutron-7dce255f-4824-4a21-a550-f8d03a25c285,
>>>>> "neutron:port_capabilities"="", "neutron:port_name"="",
>>>>> "neutron:project_id"="", "neutron:revision_number"="3",
>>>>> "neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
>>>>> "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
>>>>> -- 
>>>>> addresses           : [unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : true
>>>>> external_ids        : {"neutron:cidrs"="12.26.0.2/16",
>>>>> "neutron:device_id"=dhcp8b62a377-0e4b-5497-b096-c08bf79b6c42-
>>>>> c5db4fec-9c10-4022-835d-7281506d8a7e,
>>>>> "neutron:device_owner"="network:dhcp", "neutron:host_id"=us-east-
>>>>> standard-1, "neutron:mtu"="", "neutron:network_name"=neutron-
>>>>> c5db4fec-9c10-4022-835d-7281506d8a7e, "neutron:port_capabilities"="",
>>>>> "neutron:port_name"="",
>>>>> "neutron:project_id"=a3b7099e62ac4fb9b3d548dfaff7aeaf,
>>>>> "neutron:revision_number"="5", "neutron:security_group_ids"="",
>>>>> "neutron:subnet_pool_addr_scope4"="",
>>>>> "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
>>>>> -- 
>>>>> addresses           : [unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : true
>>>>> external_ids        : {"neutron:cidrs"="10.10.3.242/24",
>>>>> "neutron:device_id"="544cddd2-7a53-492a-8933-91fb97fd0546",
>>>>> "neutron:device_owner"="network:floatingip_agent_gateway",
>>>>> "neutron:host_id"=us-east-standard-1, "neutron:mtu"="",
>>>>> "neutron:network_name"=neutron-bb8d0ef6-9b45-4398-86f3-51323a0db2cd,
>>>>> "neutron:port_capabilities"="", "neutron:port_name"="",
>>>>> "neutron:project_id"="", "neutron:revision_number"="5",
>>>>> "neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
>>>>> "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
>>>>> -- 
>>>>> addresses           : ["fa:16:3e:6e:27:09 12.26.0.109", unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : true
>>>>> external_ids        : {"neutron:cidrs"="12.26.0.109/16",
>>>>> "neutron:device_id"="6e2d75ce-1503-4e40-bc72-ef3adc59d45f",
>>>>> "neutron:device_owner"="network:router_centralized_snat",
>>>>> "neutron:host_id"=us-east-standard-1, "neutron:mtu"="",
>>>>> "neutron:network_name"=neutron-c5db4fec-9c10-4022-835d-7281506d8a7e,
>>>>> "neutron:port_capabilities"="", "neutron:port_name"="",
>>>>> "neutron:project_id"="", "neutron:revision_number"="6",
>>>>> "neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
>>>>> "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
>>>>> -- 
>>>>> addresses           : ["fa:16:3e:0c:ac:01 12.26.1.76", unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : true
>>>>> external_ids        : {"neutron:cidrs"="12.26.1.76/16",
>>>>> "neutron:device_id"="4d3f7d3d-a637-4e40-8bc3-fda4712a1ada",
>>>>> "neutron:device_owner"="network:router_centralized_snat",
>>>>> "neutron:host_id"=us-east-standard-1, "neutron:mtu"="",
>>>>> "neutron:network_name"=neutron-c5db4fec-9c10-4022-835d-7281506d8a7e,
>>>>> "neutron:port_capabilities"="", "neutron:port_name"="",
>>>>> "neutron:project_id"="", "neutron:revision_number"="6",
>>>>> "neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
>>>>> "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
>>>>> -- 
>>>>> addresses           : [unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : true
>>>>> external_ids        : {"neutron:cidrs"="10.10.3.240/24",
>>>>> "neutron:device_id"=dhcp8b62a377-0e4b-5497-b096-c08bf79b6c42-
>>>>> bb8d0ef6-9b45-4398-86f3-51323a0db2cd,
>>>>> "neutron:device_owner"="network:dhcp", "neutron:host_id"=us-east-
>>>>> standard-1, "neutron:mtu"="", "neutron:network_name"=neutron-
>>>>> bb8d0ef6-9b45-4398-86f3-51323a0db2cd, "neutron:port_capabilities"="",
>>>>> "neutron:port_name"="",
>>>>> "neutron:project_id"="03d31c9de2ec41c787add9b44aacd3a8",
>>>>> "neutron:revision_number"="6", "neutron:security_group_ids"="",
>>>>> "neutron:subnet_pool_addr_scope4"="",
>>>>> "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
>>>>> -- 
>>>>> addresses           : [unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : []
>>>>> external_ids        : {}
>>>>> -- 
>>>>> addresses           : [unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : true
>>>>> external_ids        : {"neutron:cidrs"="193.32.177.174/24",
>>>>> "neutron:device_id"="ba1a43e2-5496-4ced-8b8c-9b42c5ddd6f1",
>>>>> "neutron:device_owner"="network:floatingip_agent_gateway",
>>>>> "neutron:host_id"=us-east-standard-2, "neutron:mtu"="",
>>>>> "neutron:network_name"=neutron-7dce255f-4824-4a21-a550-f8d03a25c285,
>>>>> "neutron:port_capabilities"="", "neutron:port_name"="",
>>>>> "neutron:project_id"="", "neutron:revision_number"="5",
>>>>> "neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
>>>>> "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
>>>>> -- 
>>>>> addresses           : [unknown]
>>>>> dhcpv4_options      : []
>>>>> dhcpv6_options      : []
>>>>> dynamic_addresses   : []
>>>>> enabled             : []
>>>>> external_ids        : {}*
>>>>>
>>>>> Regards,
>>>>>
>>>>> Ilia Baikov
>>>>> [email protected]
>>>>>
>>>>> 02.03.2026 18:23, Dumitru Ceara пишет:
>>>>>> On 3/2/26 1:18 PM, Ilia Baikov wrote:
>>>>>>> To keep region stable decided to rollback to 25.09 which has no
>>>>>>> split
>>>>>>> buf merged and migrated to L3 topology with /32 advertise via BGP.
>>>>>>> Just a guess: reaching 2k ports (VMs) in a single logical_switch is
>>>>>>> the
>>>>>>> reason why ARP flows are being dropped/discarded because of resubmit
>>>>>>> limit. What do you think?
>>>>>>>
>>>>>> Hi Ilia,
>>>>>>
>>>>>> Right, the very high number of logical switch ports that are part of
>>>>>> the
>>>>>> MC_FLOOD_L2 OVN multicast group (in this case all your VM ports) is
>>>>>> what's causing issues with broadcast ARP requests:
>>>>>> a. generated by the logical router port
>>>>>> b. generated by VMs attached to the logical switch
>>>>>>
>>>>>> I'll try to prepare a test/rfc patch in the next days to see if
>>>>>> changing
>>>>>> the action for some logical flows from flooding on the "MC_FLOOD_L2"
>>>>>> group to flooding on the "MC_UNKNOWN" group makes things work in your
>>>>>> setup.
>>>>>>
>>>>>> Before that, can you please share how many of those 2k VM ports have
>>>>>> LSP.ddresses configured to include "unknown"?
>>>>>>
>>>>>> Thanks,
>>>>>> Dumitru
>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Ilia Baikov
>>>>>>> [email protected]
>>>>>>>
>>>>>>> 26.02.2026 16:51, Ilia Baikov пишет:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This patches seems to fix DHCP issues but there is cases when
>>>>>>>> instance
>>>>>>>> booted, received configuration from the metadata service but don't
>>>>>>>> have a public connectivity (done through L2 networking).
>>>>>>>>
>>>>>>>>> Which, if the logical switch has a reasonably high number of ports
>>>>>>>>> (maybe around 200) will probably cause the resubmit limit to be
>>>>>>>>> hit
>>>>>>>> This is the case, public L2 network with around ~2000 running
>>>>>>>> instances (or ports in terms of LSP).
>>>>>>>>
>>>>>>>>> Are these OVN router port IPs?  Or are they OVN workload IPs?  Or
>>>>>>>>> are
>>>>>>>>> they just IPs owned by some fabric hosts, outside of OVN?
>>>>>>>> .1 IP from each subnet runs by the border gateway. So instance asks
>>>>>>>> for .1 to know GW MAC adddress but due to hitting limit instance
>>>>>>>> receive no response because ARP flow is dropped.
>>>>>>>>
>>>>>>>>> Also, aside from the logs, do you actually see any traffic being
>>>>>>>>> impacted?  I.e., are your workloads able to come up and properly
>>>>>>>>> communicate?
>>>>>>>> Nope, there is connectivity loss since some instances has no public
>>>>>>>> connectivity due to ARP issues.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Ilia Baikov
>>>>>>>> [email protected]
>>>>>>>> 26.02.2026 14:31, Dumitru Ceara пишет:
>>>>>>>>> Hi Ilia,
>>>>>>>>>
>>>>>>>>> On 2/24/26 3:29 PM, Ilia Baikov wrote:
>>>>>>>>>> Just checked openvswitch logs. Resubmit 4096 is actually occurs
>>>>>>>>>> even on
>>>>>>>>>> 25.09.2.
>>>>>>>>> v25.09.2 includes:
>>>>>>>>> https://github.com/ovn-org/ovn/commit/0bb60da
>>>>>>>>>
>>>>>>>>> Which should fix the "self-DoS" issues introduced by:
>>>>>>>>> https://github.com/ovn-org/ovn/commit/325c7b2
>>>>>>>>>
>>>>>>>>> But that means that in some cases, e.g., for real BUM traffic
>>>>>>>>> or for
>>>>>>>>> GARPs originated by OVN router ports we will try to "flood" the
>>>>>>>>> packet
>>>>>>>>> in the L2 broadcast domain.
>>>>>>>>>
>>>>>>>>> Which, if the logical switch has a reasonably high number of ports
>>>>>>>>> (maybe around 200) will probably cause the resubmit limit to be
>>>>>>>>> hit.
>>>>>>>>>
>>>>>>>>> In the examples below, I see the packets that cause this are ARP
>>>>>>>>> requests requesting the MAC address of:
>>>>>>>>> - 138.124.72.1
>>>>>>>>> - 83.219.248.109
>>>>>>>>> - 138.124.72.1
>>>>>>>>> - 91.92.46.1
>>>>>>>>>
>>>>>>>>> Are these OVN router port IPs?  Or are they OVN workload IPs?  Or
>>>>>>>>> are
>>>>>>>>> they just IPs owned by some fabric hosts, outside of OVN?
>>>>>>>>>
>>>>>>>>> Also, aside from the logs, do you actually see any traffic being
>>>>>>>>> impacted?  I.e., are your workloads able to come up and properly
>>>>>>>>> communicate?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Dumitru
>>>>>>>>>
>>>>>>>>>> Final flow: unchanged
>>>>>>>>>> Megaflow:
>>>>>>>>>> recirc_id=0,eth,arp,in_port=346,dl_src=fa:16:3e:63:aa:d0
>>>>>>>>>> Datapath actions: drop
>>>>>>>>>> 2026-02-24T14:23:17.457Z|04071|connmgr|INFO|br-int<->unix#4346: 1
>>>>>>>>>> flow_mods in the last 0 s (1 adds)
>>>>>>>>>> 2026-02-24T14:23:34.821Z|00076|ofproto_dpif_xlate(handler24)|
>>>>>>>>>> WARN|
>>>>>>>>>> Dropped 854 log messages in last 60 seconds (most recently, 0
>>>>>>>>>> seconds
>>>>>>>>>> ago) due to excessive rate
>>>>>>>>>> 2026-02-24T14:23:34.821Z|00077|ofproto_dpif_xlate(handler24)|
>>>>>>>>>> WARN|
>>>>>>>>>> over
>>>>>>>>>> 4096 resubmit actions on bridge br-int while processing
>>>>>>>>>> arp,in_port=4715,vlan_tci=0x0000,dl_src=fa:16:3e:97:65:15,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=138.124.72.142,arp_tpa=138.124.72.1,arp_op=1,arp_sha=fa:16:3e:97:65:15,arp_tha=00:00:00:00:00:00
>>>>>>>>>> 2026-02-24T14:23:45.464Z|00091|dpif(handler28)|WARN|system@ovs-
>>>>>>>>>> system:
>>>>>>>>>> execute
>>>>>>>>>> ct(commit,zone=163,mark=0/0x41,label=0/0xffff00000000000000000000,nat(src)),154
>>>>>>>>>>  failed (Invalid argument) on packet 
>>>>>>>>>> tcp,vlan_tci=0x0000,dl_src=0c:86:10:b7:9e:e0,dl_dst=fa:16:3e:69:22:89,nw_src=31.44.82.94,nw_dst=31.169.126.149,nw_tos=32,nw_ecn=0,nw_ttl=57,nw_frag=no,tp_src=51064,tp_dst=443,tcp_flags=syn
>>>>>>>>>>  tcp_csum:d7b0
>>>>>>>>>>     with metadata
>>>>>>>>>> skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xa3),ct_tuple4(src=31.44.82.94,dst=31.169.126.149,proto=6,tp_src=51064,tp_dst=443),in_port(2)
>>>>>>>>>>  mtu 0
>>>>>>>>>> 2026-02-24T14:23:56.702Z|00072|ofproto_dpif_upcall(handler30)|
>>>>>>>>>> WARN|
>>>>>>>>>> Dropped 697 log messages in last 60 seconds (most recently, 0
>>>>>>>>>> seconds
>>>>>>>>>> ago) due to excessive rate
>>>>>>>>>> 2026-02-24T14:23:56.702Z|00073|ofproto_dpif_upcall(handler30)|
>>>>>>>>>> WARN|Flow:
>>>>>>>>>> arp,in_port=409,vlan_tci=0x0000,dl_src=fa:16:3e:22:f2:f7,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.145.28.207,arp_tpa=192.145.28.1,arp_op=1,arp_sha=fa:16:3e:22:f2:f7,arp_tha=00:00:00:00:00:00
>>>>>>>>>>
>>>>>>>>>> bridge("br-int")
>>>>>>>>>> ----------------
>>>>>>>>>>     0. priority 0
>>>>>>>>>>        drop
>>>>>>>>>>
>>>>>>>>>> Final flow: unchanged
>>>>>>>>>> Megaflow:
>>>>>>>>>> recirc_id=0,eth,arp,in_port=409,dl_src=fa:16:3e:22:f2:f7
>>>>>>>>>> Datapath actions: drop
>>>>>>>>>> 2026-02-24T14:24:34.891Z|02715|ofproto_dpif_xlate(handler2)|WARN|
>>>>>>>>>> Dropped
>>>>>>>>>> 1059 log messages in last 60 seconds (most recently, 1 seconds
>>>>>>>>>> ago) due
>>>>>>>>>> to excessive rate
>>>>>>>>>> 2026-02-24T14:24:34.891Z|02716|ofproto_dpif_xlate(handler2)|
>>>>>>>>>> WARN|over
>>>>>>>>>> 4096 resubmit actions on bridge br-int while processing
>>>>>>>>>> arp,in_port=1,vlan_tci=0x0000,dl_src=0c:86:10:b7:9e:e0,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=83.219.248.1,arp_tpa=83.219.248.109,arp_op=1,arp_sha=0c:86:10:b7:9e:e0,arp_tha=00:00:00:00:00:00
>>>>>>>>>> 2026-02-24T14:24:46.042Z|04072|connmgr|INFO|br-int<->unix#4353: 1
>>>>>>>>>> flow_mods in the last 0 s (1 adds)
>>>>>>>>>> 2026-02-24T14:24:59.041Z|00066|ofproto_dpif_upcall(handler78)|
>>>>>>>>>> WARN|
>>>>>>>>>> Dropped 662 log messages in last 63 seconds (most recently, 3
>>>>>>>>>> seconds
>>>>>>>>>> ago) due to excessive rate
>>>>>>>>>> 2026-02-24T14:24:59.041Z|00067|ofproto_dpif_upcall(handler78)|
>>>>>>>>>> WARN|Flow:
>>>>>>>>>> arp,in_port=339,vlan_tci=0x0000,dl_src=fa:16:3e:39:60:bb,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=91.92.46.85,arp_tpa=91.92.46.1,arp_op=1,arp_sha=fa:16:3e:39:60:bb,arp_tha=00:00:00:00:00:00
>>>>>>>>>>
>>>>>>>>>> bridge("br-int")
>>>>>>>>>> ----------------
>>>>>>>>>>     0. priority 0
>>>>>>>>>>        drop
>>>>>>>>>>
>>>>>>>>>> Final flow: unchanged
>>>>>>>>>> Megaflow:
>>>>>>>>>> recirc_id=0,eth,arp,in_port=339,dl_src=fa:16:3e:39:60:bb
>>>>>>>>>> Datapath actions: drop
>>>>>>>>>> 2026-02-24T14:25:34.783Z|00067|ofproto_dpif_xlate(handler7)|WARN|
>>>>>>>>>> Dropped
>>>>>>>>>> 952 log messages in last 60 seconds (most recently, 0 seconds
>>>>>>>>>> ago)
>>>>>>>>>> due
>>>>>>>>>> to excessive rate
>>>>>>>>>> 2026-02-24T14:25:34.783Z|00068|ofproto_dpif_xlate(handler7)|
>>>>>>>>>> WARN|over
>>>>>>>>>> 4096 resubmit actions on bridge br-int while processing
>>>>>>>>>> arp,in_port=4812,vlan_tci=0x0000,dl_src=fa:16:3e:68:f7:1b,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=138.124.72.245,arp_tpa=138.124.72.1,arp_op=1,arp_sha=fa:16:3e:68:f7:1b,arp_tha=00:00:00:00:00:00
>>>>>>>>>> 2026-02-24T14:25:59.094Z|00067|ofproto_dpif_upcall(handler11)|
>>>>>>>>>> WARN|
>>>>>>>>>> Dropped 720 log messages in last 60 seconds (most recently, 0
>>>>>>>>>> seconds
>>>>>>>>>> ago) due to excessive rate
>>>>>>>>>> 2026-02-24T14:25:59.095Z|00068|ofproto_dpif_upcall(handler11)|
>>>>>>>>>> WARN|Flow:
>>>>>>>>>> arp,in_port=305,vlan_tci=0x0000,dl_src=fa:16:3e:d9:8d:f3,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=91.92.46.188,arp_tpa=91.92.46.1,arp_op=1,arp_sha=fa:16:3e:d9:8d:f3,arp_tha=00:00:00:00:00:00
>>>>>>>>>>
>>>>>>>>>> bridge("br-int")
>>>>>>>>>> ----------------
>>>>>>>>>>     0. priority 0
>>>>>>>>>>        drop
>>>>>>>>>>
>>>>>>>>>> Final flow: unchanged
>>>>>>>>>> Megaflow:
>>>>>>>>>> recirc_id=0,eth,arp,in_port=305,dl_src=fa:16:3e:d9:8d:f3
>>>>>>>>>> Datapath actions: drop
>>>>>>>>>> 2026-02-24T14:26:35.024Z|02717|ofproto_dpif_xlate(handler2)|WARN|
>>>>>>>>>> Dropped
>>>>>>>>>> 937 log messages in last 61 seconds (most recently, 1 seconds
>>>>>>>>>> ago)
>>>>>>>>>> due
>>>>>>>>>> to excessive rate
>>>>>>>>>> 2026-02-24T14:26:35.024Z|02718|ofproto_dpif_xlate(handler2)|
>>>>>>>>>> WARN|over
>>>>>>>>>> 4096 resubmit actions on bridge br-int while processing
>>>>>>>>>> arp,in_port=1,vlan_tci=0x0000,dl_src=0c:86:10:b7:9e:e0,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=104.165.244.1,arp_tpa=104.165.244.146,arp_op=1,arp_sha=0c:86:10:b7:9e:e0,arp_tha=00:00:00:00:00:00
>>>>>>>>>> 2026-02-24T14:26:59.151Z|00067|ofproto_dpif_upcall(handler67)|
>>>>>>>>>> WARN|
>>>>>>>>>> Dropped 884 log messages in last 60 seconds (most recently, 0
>>>>>>>>>> seconds
>>>>>>>>>> ago) due to excessive rate
>>>>>>>>>> 2026-02-24T14:26:59.151Z|00068|ofproto_dpif_upcall(handler67)|
>>>>>>>>>> WARN|Flow:
>>>>>>>>>> arp,in_port=380,vlan_tci=0x0000,dl_src=fa:16:3e:f1:5b:e7,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=138.124.72.215,arp_tpa=138.124.72.1,arp_op=1,arp_sha=fa:16:3e:f1:5b:e7,arp_tha=00:00:00:00:00:00
>>>>>>>>>>
>>>>>>>>>> bridge("br-int")
>>>>>>>>>> ----------------
>>>>>>>>>>     0. in_port=380, priority 100, cookie 0x2cfc9def
>>>>>>>>>>        set_field:0x90/0xffff->reg13
>>>>>>>>>>        set_field:0x3->reg11
>>>>>>>>>>        set_field:0x1->reg12
>>>>>>>>>>        set_field:0x1->metadata
>>>>>>>>>>        set_field:0x1d2->reg14
>>>>>>>>>>        set_field:0/0xffff0000->reg13
>>>>>>>>>>        resubmit(,8)
>>>>>>>>>>     8. metadata=0x1, priority 50, cookie 0x43f4e129
>>>>>>>>>>        set_field:0/0x1000->reg10
>>>>>>>>>>        resubmit(,73)
>>>>>>>>>>        73. arp,reg14=0x1d2,metadata=0x1, priority 95, cookie
>>>>>>>>>> 0x2cfc9def
>>>>>>>>>>                resubmit(,74)
>>>>>>>>>>            74. arp,reg14=0x1d2,metadata=0x1, priority 80, cookie
>>>>>>>>>> 0x2cfc9def
>>>>>>>>>>                set_field:0x1000/0x1000->reg10
>>>>>>>>>>        move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
>>>>>>>>>>         -> NXM_NX_XXREG0[111] is now 0x1
>>>>>>>>>>        resubmit(,9)
>>>>>>>>>>     9. reg0=0x8000/0x8000,metadata=0x1, priority 50, cookie
>>>>>>>>>> 0xf4bfe3b3
>>>>>>>>>>        drop
>>>>>>>>>>
>>>>>>>>>> Final flow:
>>>>>>>>>> arp,reg0=0x8000,reg10=0x1000,reg11=0x3,reg12=0x1,reg13=0x90,reg14=0x1d2,metadata=0x1,in_port=380,vlan_tci=0x0000,dl_src=fa:16:3e:f1:5b:e7,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=138.124.72.215,arp_tpa=138.124.72.1,arp_op=1,arp_sha=fa:16:3e:f1:5b:e7,arp_tha=00:00:00:00:00:00
>>>>>>>>>> Megaflow:
>>>>>>>>>> recirc_id=0,eth,arp,in_port=380,dl_src=fa:16:3e:f1:5b:e7
>>>>>>>>>> Datapath actions: drop
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Broadcast arps to all routers is set to false.
>>>>>>>>>> _uuid               : 1841d88f-3fbf-427f-8d6c-c3edaba47a0a
>>>>>>>>>> acls                : []
>>>>>>>>>> copp                : []
>>>>>>>>>> dns_records         : []
>>>>>>>>>> external_ids        : {"neutron:availability_zone_hints"="",
>>>>>>>>>> "neutron:mtu"="1500", "neutron:network_name"=poland-public,
>>>>>>>>>> "neutron:provnet-network-type"=vlan,
>>>>>>>>>> "neutron:revision_number"="12"}
>>>>>>>>>> forwarding_groups   : []
>>>>>>>>>> load_balancer       : []
>>>>>>>>>> load_balancer_group : []
>>>>>>>>>> name                : neutron-da85395e-c326-489d-b4e6-
>>>>>>>>>> dfb62aad360d
>>>>>>>>>> other_config        : {broadcast-arps-to-all-routers="false",
>>>>>>>>>> fdb_age_threshold="0", mcast_flood_unregistered="false",
>>>>>>>>>> mcast_snoop="false", vlan-passthru="false"}
>>>>>>>>>> ports               : [00288a04-90a4-4e8e-bada-8213747c92e4,
>>>>>>>>>> 0047d609-
>>>>>>>>>> ebff-4c43-8f1d-32d83d70c9e6, 00b6c585-ae29-4e88-a52a-3a16e1d91112
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Ilia Baikov
>>>>>>>>>> [email protected]
>>>>>>>>>>
>>>>>>>>>> 24.02.2026 17:16, Ilia Baikov пишет:
>>>>>>>>>>> Hello,
>>>>>>>>>>> After ugprading to OpenStack 2025.2 with OVN 25.09.2 (which
>>>>>>>>>>> contains
>>>>>>>>>>> split buf fix) seems like no issues with DHCP, but I see a
>>>>>>>>>>> lot of
>>>>>>>>>>> missed ARP, VM unable to reach GW and there is no ARP
>>>>>>>>>>> broadcasted to
>>>>>>>>>>> some of VMs. Debugging shows that ovn installs drop arp flows
>>>>>>>>>>> for
>>>>>>>>>>> some
>>>>>>>>>>> reason.
>>>>>>>>>>>
>>>>>>>>>>> ovs-appctl ofproto/trace br-int \
>>>>>>>>>>> "in_port=2,dl_vlan=1000,dl_src=0c:86:10:b7:9e:e0,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0806,arp_op=1,arp_spa=192.145.28.1,arp_tpa=192.145.28.113"
>>>>>>>>>>>  2>&1 | tail -80
>>>>>>>>>>> Flow:
>>>>>>>>>>> arp,in_port=2,dl_vlan=1000,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=0c:86:10:b7:9e:e0,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.145.28.1,arp_tpa=192.145.28.113,arp_op=1,arp_sha=00:00:00:00:00:00,arp_tha=00:00:00:00:00:00
>>>>>>>>>>>
>>>>>>>>>>> bridge("br-int")
>>>>>>>>>>> ----------------
>>>>>>>>>>>     0. in_port=2, priority 100
>>>>>>>>>>>        move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23]
>>>>>>>>>>>         -> OXM_OF_METADATA[0..23] is now 0
>>>>>>>>>>>        move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14]
>>>>>>>>>>>         -> NXM_NX_REG14[0..14] is now 0
>>>>>>>>>>>        move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15]
>>>>>>>>>>>         -> NXM_NX_REG15[0..15] is now 0
>>>>>>>>>>>        resubmit(,45)
>>>>>>>>>>> 45. priority 0
>>>>>>>>>>>        drop
>>>>>>>>>>>
>>>>>>>>>>> Final flow: unchanged
>>>>>>>>>>> Megaflow: recirc_id=0,eth,arp,in_port=2,dl_src=0c:86:10:b7:9e:e0
>>>>>>>>>>> Datapath actions: drop
>>>>>>>>>>>
>>>>>>>>>>> docker exec ovn_controller ovn-controller --version
>>>>>>>>>>> ovn-controller 25.09.2
>>>>>>>>>>> Open vSwitch Library 3.6.2
>>>>>>>>>>> OpenFlow versions 0x6:0x6
>>>>>>>>>>> SB DB Schema 21.5.0
>>>>>>>>>>>
>>>>>>>>>>> ovn-controller logs shows no errors clearly:
>>>>>>>>>>> 2026-02-24T14:06:39.403Z|00001|vlog|INFO|opened log file /
>>>>>>>>>>> var/log/
>>>>>>>>>>> kolla/openvswitch/ovn-controller.log
>>>>>>>>>>> 2026-02-24T14:06:39.406Z|00002|reconnect|INFO|
>>>>>>>>>>> tcp:127.0.0.1:6640:
>>>>>>>>>>> connecting...
>>>>>>>>>>> 2026-02-24T14:06:39.406Z|00003|reconnect|INFO|
>>>>>>>>>>> tcp:127.0.0.1:6640:
>>>>>>>>>>> connected
>>>>>>>>>>> 2026-02-24T14:06:39.463Z|00004|main|INFO|OVN internal version
>>>>>>>>>>> is :
>>>>>>>>>>> [25.09.2-21.5.0-81.10]
>>>>>>>>>>> 2026-02-24T14:06:39.463Z|00005|main|INFO|OVS IDL reconnected,
>>>>>>>>>>> force
>>>>>>>>>>> recompute.
>>>>>>>>>>> 2026-02-24T14:06:39.464Z|00006|reconnect|INFO|
>>>>>>>>>>> tcp:10.11.0.4:16641:
>>>>>>>>>>> connecting...
>>>>>>>>>>> 2026-02-24T14:06:39.464Z|00007|main|INFO|OVNSB IDL reconnected,
>>>>>>>>>>> force
>>>>>>>>>>> recompute.
>>>>>>>>>>> 2026-02-24T14:06:39.464Z|00008|reconnect|INFO|
>>>>>>>>>>> tcp:10.11.0.4:16641:
>>>>>>>>>>> connected
>>>>>>>>>>> 2026-02-24T14:06:39.464Z|00001|rconn(ovn_statctrl3)|INFO|unix:/
>>>>>>>>>>> var/
>>>>>>>>>>> run/openvswitch/br-int.mgmt: connected
>>>>>>>>>>> 2026-02-24T14:06:39.464Z|00001|rconn(ovn_pinctrl0)|INFO|unix:/
>>>>>>>>>>> var/run/
>>>>>>>>>>> openvswitch/br-int.mgmt: connected
>>>>>>>>>>> 2026-02-24T14:06:39.529Z|00009|main|INFO|OVS feature set
>>>>>>>>>>> changed,
>>>>>>>>>>> force recompute.
>>>>>>>>>>> 2026-02-24T14:06:39.532Z|00010|rconn|INFO|unix:/var/run/
>>>>>>>>>>> openvswitch/
>>>>>>>>>>> br-int.mgmt: connected
>>>>>>>>>>> 2026-02-24T14:06:39.532Z|00011|main|INFO|OVS OpenFlow connection
>>>>>>>>>>> reconnected,force recompute.
>>>>>>>>>>> 2026-02-24T14:06:39.536Z|00012|main|INFO|OVS feature set
>>>>>>>>>>> changed,
>>>>>>>>>>> force recompute.
>>>>>>>>>>> 2026-02-24T14:06:40.564Z|00013|main|INFO|OVS feature set
>>>>>>>>>>> changed,
>>>>>>>>>>> force recompute.
>>>>>>>>>>> 2026-02-24T14:06:45.920Z|00014|binding|INFO|Releasing lport
>>>>>>>>>>> bcd3ecfa-
>>>>>>>>>>> f43c-4e72-8978-73bbad07ed75 from this chassis (sb_readonly=1)
>>>>>>>>>>> 2026-02-24T14:06:45.924Z|00015|binding|INFO|Releasing lport
>>>>>>>>>>> 4f1f45b0-726c-4fea-b462-06dcbf559c25 from this chassis
>>>>>>>>>>> (sb_readonly=1)
>>>>>>>>>>> 2026-02-24T14:06:46.927Z|00016|timeval|WARN|Unreasonably long
>>>>>>>>>>> 1413ms
>>>>>>>>>>> poll interval (1294ms user, 117ms system)
>>>>>>>>>>> 2026-02-24T14:06:46.927Z|00017|timeval|WARN|faults: 38131 minor,
>>>>>>>>>>> 0 major
>>>>>>>>>>> 2026-02-24T14:06:46.927Z|00018|timeval|WARN|disk: 0 reads, 8
>>>>>>>>>>> writes
>>>>>>>>>>> 2026-02-24T14:06:46.927Z|00019|timeval|WARN|context switches: 0
>>>>>>>>>>> voluntary, 65 involuntary
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00020|coverage|INFO|Event coverage, avg
>>>>>>>>>>> rate
>>>>>>>>>>> over last: 5 seconds, last minute, last hour,  hash=1a815819:
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00021|coverage|INFO|physical_run
>>>>>>>>>>>  0.2/sec
>>>>>>>>>>>     0.017/sec        0.0003/sec   total: 1
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00022|coverage|INFO|lflow_conj_alloc
>>>>>>>>>>>   0.0/
>>>>>>>>>>> sec     0.000/sec        0.0000/sec   total: 407
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00023|coverage|INFO|lflow_cache_miss
>>>>>>>>>>>   0.0/
>>>>>>>>>>> sec     0.000/sec        0.0000/sec   total: 13470
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00024|coverage|INFO|lflow_cache_hit
>>>>>>>>>>> 0.0/sec
>>>>>>>>>>>       0.000/sec        0.0000/sec   total: 394
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00025|coverage|INFO|lflow_cache_add
>>>>>>>>>>> 0.0/sec
>>>>>>>>>>>       0.000/sec        0.0000/sec   total: 12956
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00026|coverage|INFO|
>>>>>>>>>>> lflow_cache_add_matches
>>>>>>>>>>> 0.0/sec     0.000/sec        0.0000/sec   total: 2412
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00027|coverage|INFO|
>>>>>>>>>>> lflow_cache_add_expr
>>>>>>>>>>>     0.0/sec     0.000/sec        0.0000/sec   total: 10544
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00028|coverage|INFO|
>>>>>>>>>>> consider_logical_flow
>>>>>>>>>>> 0.0/sec     0.000/sec        0.0000/sec   total: 20680
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00029|coverage|INFO|lflow_run 0.2/sec
>>>>>>>>>>>     0.017/sec        0.0003/sec   total: 1
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00030|coverage|INFO|miniflow_malloc
>>>>>>>>>>>   16.6/
>>>>>>>>>>> sec     1.383/sec        0.0231/sec   total: 28561
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00031|coverage|INFO|hmap_pathological
>>>>>>>>>>>    11.2/
>>>>>>>>>>> sec     0.933/sec        0.0156/sec   total: 257
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00032|coverage|INFO|hmap_expand
>>>>>>>>>>> 837.2/sec
>>>>>>>>>>> 69.767/sec        1.1628/sec   total: 30358
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00033|coverage|INFO|hmap_reserve
>>>>>>>>>>>  0.4/sec
>>>>>>>>>>>     0.033/sec        0.0006/sec   total: 21733
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00034|coverage|INFO|txn_unchanged
>>>>>>>>>>> 2.4/sec
>>>>>>>>>>>     0.200/sec        0.0033/sec   total: 65
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00035|coverage|INFO|txn_incomplete
>>>>>>>>>>>   1.4/sec
>>>>>>>>>>>       0.117/sec        0.0019/sec   total: 60
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00036|coverage|INFO|txn_success 0.6/sec
>>>>>>>>>>>     0.050/sec        0.0008/sec   total: 3
>>>>>>>>>>> 2026-02-24T14:06:46.936Z|00037|coverage|INFO|poll_create_node
>>>>>>>>>>> 24.0/
>>>>>>>>>>> sec     2.000/sec        0.0333/sec   total: 1304
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00038|coverage|INFO|poll_zero_timeout
>>>>>>>>>>> 0.0/
>>>>>>>>>>> sec     0.000/sec        0.0000/sec   total: 1
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00039|coverage|INFO|rconn_queued
>>>>>>>>>>>  0.8/sec
>>>>>>>>>>>     0.067/sec        0.0011/sec   total: 4
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00040|coverage|INFO|rconn_sent  0.8/sec
>>>>>>>>>>>     0.067/sec        0.0011/sec   total: 4
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00041|coverage|INFO|seq_change  9.2/sec
>>>>>>>>>>>     0.767/sec        0.0128/sec   total: 532
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00042|coverage|INFO|pstream_open
>>>>>>>>>>>  0.2/sec
>>>>>>>>>>>     0.017/sec        0.0003/sec   total: 1
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00043|coverage|INFO|stream_open 1.2/sec
>>>>>>>>>>>     0.100/sec        0.0017/sec   total: 6
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00044|coverage|INFO|util_xalloc
>>>>>>>>>>> 29035.4/sec
>>>>>>>>>>> 2419.617/sec       40.3269/sec   total: 2277081
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00045|coverage|INFO|vconn_received
>>>>>>>>>>>   0.8/sec
>>>>>>>>>>>       0.067/sec        0.0011/sec   total: 4
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00046|coverage|INFO|vconn_sent  1.2/sec
>>>>>>>>>>>     0.100/sec        0.0017/sec   total: 6
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00047|coverage|INFO|
>>>>>>>>>>> jsonrpc_recv_incomplete
>>>>>>>>>>> 0.6/sec     0.050/sec        0.0008/sec   total: 52
>>>>>>>>>>> 2026-02-24T14:06:46.937Z|00048|coverage|INFO|138 events never
>>>>>>>>>>> hit
>>>>>>>>>>> 2026-02-24T14:06:46.976Z|00049|binding|INFO|Releasing lport
>>>>>>>>>>> 4f1f45b0-726c-4fea-b462-06dcbf559c25 from this chassis
>>>>>>>>>>> (sb_readonly=0)
>>>>>>>>>>> 2026-02-24T14:06:46.977Z|00050|binding|INFO|Releasing lport
>>>>>>>>>>> bcd3ecfa-
>>>>>>>>>>> f43c-4e72-8978-73bbad07ed75 from this chassis (sb_readonly=0)
>>>>>>>>>>> 2026-02-24T14:06:48.054Z|00051|timeval|WARN|Unreasonably long
>>>>>>>>>>> 1117ms
>>>>>>>>>>> poll interval (1108ms user, 8ms system)
>>>>>>>>>>> 2026-02-24T14:06:48.054Z|00052|timeval|WARN|faults: 2581
>>>>>>>>>>> minor, 0
>>>>>>>>>>> major
>>>>>>>>>>> 2026-02-24T14:06:48.054Z|00053|timeval|WARN|context switches: 0
>>>>>>>>>>> voluntary, 8 involuntary
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00054|coverage|INFO|Event coverage, avg
>>>>>>>>>>> rate
>>>>>>>>>>> over last: 5 seconds, last minute, last hour,  hash=0878340f:
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00055|coverage|INFO|physical_run
>>>>>>>>>>>  0.2/sec
>>>>>>>>>>>     0.017/sec        0.0003/sec   total: 2
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00056|coverage|INFO|lflow_conj_alloc
>>>>>>>>>>>   0.0/
>>>>>>>>>>> sec     0.000/sec        0.0000/sec   total: 814
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00057|coverage|INFO|lflow_cache_miss
>>>>>>>>>>>   0.0/
>>>>>>>>>>> sec     0.000/sec        0.0000/sec   total: 13979
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00058|coverage|INFO|lflow_cache_hit
>>>>>>>>>>> 0.0/sec
>>>>>>>>>>>       0.000/sec        0.0000/sec   total: 13671
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00059|coverage|INFO|lflow_cache_add
>>>>>>>>>>> 0.0/sec
>>>>>>>>>>>       0.000/sec        0.0000/sec   total: 12956
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00060|coverage|INFO|
>>>>>>>>>>> lflow_cache_add_matches
>>>>>>>>>>> 0.0/sec     0.000/sec        0.0000/sec   total: 2412
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00061|coverage|INFO|
>>>>>>>>>>> lflow_cache_add_expr
>>>>>>>>>>>     0.0/sec     0.000/sec        0.0000/sec   total: 10544
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00062|coverage|INFO|
>>>>>>>>>>> consider_logical_flow
>>>>>>>>>>> 0.0/sec     0.000/sec        0.0000/sec   total: 41360
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00063|coverage|INFO|lflow_run 0.2/sec
>>>>>>>>>>>     0.017/sec        0.0003/sec   total: 2
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00064|coverage|INFO|cmap_expand 0.0/sec
>>>>>>>>>>>     0.000/sec        0.0000/sec   total: 7
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00065|coverage|INFO|miniflow_malloc
>>>>>>>>>>>   16.6/
>>>>>>>>>>> sec     1.383/sec        0.0231/sec   total: 63156
>>>>>>>>>>> 2026-02-24T14:06:48.055Z|00066|coverage|INFO|hmap_pathological
>>>>>>>>>>>    11.2/
>>>>>>>>>>> sec     0.933/sec        0.0156/sec   total: 311
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00067|coverage|INFO|hmap_expand
>>>>>>>>>>> 837.2/sec
>>>>>>>>>>> 69.767/sec        1.1628/sec   total: 30539
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00068|coverage|INFO|hmap_reserve
>>>>>>>>>>>  0.4/sec
>>>>>>>>>>>     0.033/sec        0.0006/sec   total: 22553
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00069|coverage|INFO|txn_unchanged
>>>>>>>>>>> 2.4/sec
>>>>>>>>>>>     0.200/sec        0.0033/sec   total: 67
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00070|coverage|INFO|txn_incomplete
>>>>>>>>>>>   1.4/sec
>>>>>>>>>>>       0.117/sec        0.0019/sec   total: 60
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00071|coverage|INFO|txn_success 0.6/sec
>>>>>>>>>>>     0.050/sec        0.0008/sec   total: 4
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00072|coverage|INFO|poll_create_node
>>>>>>>>>>> 24.0/
>>>>>>>>>>> sec     2.000/sec        0.0333/sec   total: 1335
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00073|coverage|INFO|poll_zero_timeout
>>>>>>>>>>> 0.0/
>>>>>>>>>>> sec     0.000/sec        0.0000/sec   total: 1
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00074|coverage|INFO|rconn_queued
>>>>>>>>>>>  0.8/sec
>>>>>>>>>>>     0.067/sec        0.0011/sec   total: 4
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00075|coverage|INFO|rconn_sent  0.8/sec
>>>>>>>>>>>     0.067/sec        0.0011/sec   total: 4
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00076|coverage|INFO|seq_change  9.2/sec
>>>>>>>>>>>     0.767/sec        0.0128/sec   total: 546
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00077|coverage|INFO|pstream_open
>>>>>>>>>>>  0.2/sec
>>>>>>>>>>>     0.017/sec        0.0003/sec   total: 1
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00078|coverage|INFO|stream_open 1.2/sec
>>>>>>>>>>>     0.100/sec        0.0017/sec   total: 6
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00079|coverage|INFO|long_poll_interval
>>>>>>>>>>>     0.0/sec     0.000/sec        0.0000/sec   total: 1
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00080|coverage|INFO|util_xalloc
>>>>>>>>>>> 29035.4/sec
>>>>>>>>>>> 2419.617/sec       40.3269/sec   total: 2477649
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00081|coverage|INFO|vconn_received
>>>>>>>>>>>   0.8/sec
>>>>>>>>>>>       0.067/sec        0.0011/sec   total: 4
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00082|coverage|INFO|vconn_sent  1.2/sec
>>>>>>>>>>>     0.100/sec        0.0017/sec   total: 6
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00083|coverage|INFO|
>>>>>>>>>>> jsonrpc_recv_incomplete
>>>>>>>>>>> 0.6/sec     0.050/sec        0.0008/sec   total: 52
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00084|coverage|INFO|136 events never
>>>>>>>>>>> hit
>>>>>>>>>>> 2026-02-24T14:06:48.056Z|00085|poll_loop|INFO|wakeup due to
>>>>>>>>>>> [POLLIN]
>>>>>>>>>>> on fd 29 (10.11.0.2:40496<->10.11.0.4:16641) at lib/stream-
>>>>>>>>>>> fd.c:157
>>>>>>>>>>> (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:48.097Z|00086|poll_loop|INFO|wakeup due to
>>>>>>>>>>> [POLLIN]
>>>>>>>>>>> on fd 29 (10.11.0.2:40496<->10.11.0.4:16641) at lib/stream-
>>>>>>>>>>> fd.c:157
>>>>>>>>>>> (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:48.104Z|00087|poll_loop|INFO|wakeup due to 0-ms
>>>>>>>>>>> timeout at controller/ovn-controller.c:7558 (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:48.283Z|00088|poll_loop|INFO|wakeup due to 0-ms
>>>>>>>>>>> timeout at controller/ofctrl.c:692 (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:48.870Z|00089|poll_loop|INFO|wakeup due to
>>>>>>>>>>> [POLLIN]
>>>>>>>>>>> on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
>>>>>>>>>>> fd.c:153
>>>>>>>>>>> (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:48.877Z|00090|poll_loop|INFO|wakeup due to
>>>>>>>>>>> [POLLOUT]
>>>>>>>>>>> on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
>>>>>>>>>>> fd.c:153
>>>>>>>>>>> (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:48.884Z|00091|poll_loop|INFO|wakeup due to
>>>>>>>>>>> [POLLOUT]
>>>>>>>>>>> on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
>>>>>>>>>>> fd.c:153
>>>>>>>>>>> (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:48.892Z|00092|poll_loop|INFO|wakeup due to
>>>>>>>>>>> [POLLOUT]
>>>>>>>>>>> on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
>>>>>>>>>>> fd.c:153
>>>>>>>>>>> (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:48.900Z|00093|poll_loop|INFO|wakeup due to
>>>>>>>>>>> [POLLOUT]
>>>>>>>>>>> on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
>>>>>>>>>>> fd.c:153
>>>>>>>>>>> (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:48.907Z|00094|poll_loop|INFO|wakeup due to
>>>>>>>>>>> [POLLOUT]
>>>>>>>>>>> on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
>>>>>>>>>>> fd.c:153
>>>>>>>>>>> (82% CPU usage)
>>>>>>>>>>> 2026-02-24T14:06:49.875Z|00095|memory|INFO|143124 kB peak
>>>>>>>>>>> resident set
>>>>>>>>>>> size after 10.5 seconds
>>>>>>>>>>> 2026-02-24T14:06:49.875Z|00096|memory|INFO|idl-cells-
>>>>>>>>>>> OVN_Southbound:301305 idl-cells-Open_vSwitch:25815 lflow-cache-
>>>>>>>>>>> entries-cache-expr:10548 lflow-cache-entries-cache-matches:2413
>>>>>>>>>>> lflow-
>>>>>>>>>>> cache-size-KB:32447 local_datapath_usage-KB:2
>>>>>>>>>>> ofctrl_desired_flow_usage-KB:8528 ofctrl_installed_flow_usage-
>>>>>>>>>>> KB:6365
>>>>>>>>>>> ofctrl_rconn_packet_counter-KB:5161 ofctrl_sb_flow_ref_usage-
>>>>>>>>>>> KB:3196
>>>>>>>>>>> oflow_update_usage-KB:1
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Ilia Baikov
>>>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>> 12.02.2026 21:22, Ilia Baikov пишет:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> Returning back to this issue after a while as I'm migrating
>>>>>>>>>>>> ml2/ovs
>>>>>>>>>>>> to ml2/ovn.
>>>>>>>>>>>> Seems like the same issue from 2025 still persists.
>>>>>>>>>>>>
>>>>>>>>>>>> refs:
>>>>>>>>>>>> [0]https://mail.openvswitch.org/pipermail/ovs-discuss/2025-
>>>>>>>>>>>> February/053456.html
>>>>>>>>>>>> [1]https://mail.openvswitch.org/pipermail/ovs-discuss/2025-
>>>>>>>>>>>> March/053484.html
>>>>>>>>>>>>
>>>>>>>>>>>> case:
>>>>>>>>>>>> Big L2 domain with border device learning IPs by flooding ARP.
>>>>>>>>>>>> For
>>>>>>>>>>>> some reason in case L2 device (nic with VLANs attached) is
>>>>>>>>>>>> attached
>>>>>>>>>>>> to the br-ex bridge, then after a while OVN stops sending DHCP
>>>>>>>>>>>> packets (OFFER/ACK/etc).
>>>>>>>>>>>>
>>>>>>>>>>>> Anybody else observed the same issue? The only way to stabilize
>>>>>>>>>>>> region is to switch to L3 networking using ovn-bgp-agent
>>>>>>>>>>>> (eth0 is
>>>>>>>>>>>> detached from br-ex, so no more ARPs delivered to ovn-
>>>>>>>>>>>> controller),
>>>>>>>>>>>> but there is monstrous overhead using kernel routing: IRQ
>>>>>>>>>>>> higher up
>>>>>>>>>>>> to x5-6 times, like 10-12% while L2 networking is just 2% which
>>>>>>>>>>>> is fine.
>>>>>>>>>>>>
>>>>>>>>>>>> Meanwhile, no errors, warns, resubmit logs in logs.
>>>>>>>>>>>>
> 

_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to