Hi Dumitru,

Yep, anyway I just set it to false like 10 minutes ago. Once connected to the testing instance started tcpdump on a LSP from L2 network and found out something interesting, take a look:

root@infra-us:~# tcpdump -ni ens3 arp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ens3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:37:18.435647 ARP, Request who-has 85.209.134.1 tell 85.209.134.123, length 28
11:37:18.445138 ARP, Reply 85.209.134.1 is-at d4:04:ff:96:a4:00, length 46

There is no incoming ARPs, but when instance asking for gw mac address it receive response.

On the other side, when i manually delete ARP from the border gateway like `clear arp hostname 85.209.134.123` this is what I can see at eno49's tcpdump (used as attached uplink to br-ex bridge). It seems like that ARP not being delivered to the instance LSP at all (see screenshot related, or imgur as backup option https://i.imgur.com/GcKNQPE.jpeg)



What I found weird that instance didn't reply with ARP (as it is not delivered), probably ARP resolves in a one way when instance asks for ARP, not by learning by gateway device.

Regards,

Ilia Baikov
[email protected]

04.03.2026 14:04, Dumitru Ceara пишет:
Hi Ilia,

On 3/3/26 5:08 PM, Ilia Baikov wrote:
Just to be sure I understand. Is traffic from your workloads still
affected?
Unfortunately it still affects. As ARP is handled other way  (arp flow
is dropped at OvS level) some of the instances has no public connectivity.


Double checking, are you sure you upgraded ovn-northd to use the version
from my branch?
100% sure. I've builded packages from source, packed into deb packages
and then builded kolla images using self-hosted repo with compiled deb
packages.

Thanks for checking!

ansible all -i multinode -m shell -a "docker exec ovn_northd ovn-northd
--version" --limit control

us-east-standard-1 | CHANGED | rc=0 >>
ovn-northd 24.03.8
Open vSwitch Library 3.3.8

us-east-4 | CHANGED | rc=0 >>
ovn-northd 24.03.8
Open vSwitch Library 3.3.8

I have a hunch and I think I know what might be happening.  Is it
possible that your logical switches don't have
other_config:broadcast-arps-to-all-routers=false?

Also, we're currently flooding all ARP requests coming from the fabric
(entering OVN through the localnet port) to all logical switch ports =>
4K resubmit limit gets hit.

I'll update my test patch to cover this last part too while waiting for
your reply on the LS config question above.

Regards,
Dumitru

Regards,

Ilia Baikov
[email protected]

03.03.2026 18:54, Dumitru Ceara пишет:
On 3/3/26 3:54 PM, Ilia Baikov wrote:
Hi,

Hi,


Done, OVN components are now builded from your branch and deployed into
production region where issue persist. To be more precise lets focus on
a cluster that runs using L2 network setup, as this is the best field
for testing and reproduction for this case and not breaking other
stabilised regions which runs L3.

ovn-controller 24.03.8
Open vSwitch Library 3.3.8
OpenFlow versions 0x6:0x6
SB DB Schema 20.33.0

OVN components are deployed ~ at 14:11, then resubmit exceptions
appearing (24.03.2 just shows unrecognized op code (27) and so on).

Just to be sure I understand.  Is traffic from your workloads still
affected?

I've also enabled rconn/vconn dbg for ovn and ovs (but later than 14:11,
but it seems vconn/rconn shows something useful for debugging).

ovn logs starting from 14:11 -https://gist.githubusercontent.com/
frct1/5f99221e1519d1552c8ef16a7ec8ee52/
raw/147e9a171e538f9cd837008181272437b1c7ed37/ovn.log
ovs logs starting from 14:11 -https://gist.githubusercontent.com/
frct1/5f99221e1519d1552c8ef16a7ec8ee52/
raw/147e9a171e538f9cd837008181272437b1c7ed37/ovs.log

Double checking, are you sure you upgraded ovn-northd to use the version
from my branch?

I'm asking because this packet should not hit the mc_flood_l2 group
anymore:

2026-03-03T14:22:51.358Z|00012|ofproto_dpif_xlate(handler250)|WARN|
Dropped
2244 log messages in last 60 seconds (most recently, 0 seconds ago) due
to excessive rate
2026-03-03T14:22:51.358Z|00013|ofproto_dpif_xlate(handler250)|WARN|over
4096 resubmit actions on bridge br-int while processing
arp,in_port=2409,vlan_tci=0x0000,dl_src=fa:16:3e:ba:70:84,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=166.1.160.225,arp_tpa=166.1.160.1,arp_op=1,arp_sha=fa:16:3e:ba:70:84,arp_tha=00:00:00:00:00:00

The mc_flood_l2 group has 2k ports (all the VM ports) but my change
should change it to hit the mc_unknown group, which only has a handful
of ports as you said in your previous email.

I will keep it running 24.03.8 for easier debugging.

Thanks,
Dumitru

Regards,

Ilia Baikov
[email protected]

03.03.2026 15:55, Dumitru Ceara пишет:
On 3/2/26 7:24 PM, Ilia Baikov wrote:
Hi Dumitru!

Hi Ilia,

ovn-nbctl --no-leader-only list Logical_switch_port | grep unknown |
wc -l
*10

OK, that's just a few (and I see on your other deployment too), so
that's great.

Mind trying out this WIP patch for now and see if it works for you?

https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-26.03
https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-25.09
https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-25.03
https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-24.09
https://github.com/dceara/ovn/commits/mc_flood_l2_to_unknown-24.03

They're all the same, just based on different stable branches, I wasn't
sure which one you'll need.

Looking forward to hear your results.

Thanks,
Dumitru


*More output for lsp list (no filtering with ls uuid)*

ovn-nbctl --no-leader-only list Logical_switch_port | grep unknown
-A 5
addresses           : [unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids        : {"neutron:cidrs"="10.10.3.243/24",
"neutron:device_id"="ba1a43e2-5496-4ced-8b8c-9b42c5ddd6f1",
"neutron:device_owner"="network:floatingip_agent_gateway",
"neutron:host_id"=us-east-standard-2, "neutron:mtu"="",
"neutron:network_name"=neutron-bb8d0ef6-9b45-4398-86f3-51323a0db2cd,
"neutron:port_capabilities"="", "neutron:port_name"="",
"neutron:project_id"="", "neutron:revision_number"="5",
"neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
"neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
--
addresses           : ["fa:16:3e:62:e5:5f 193.32.177.44", unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids        : {"neutron:cidrs"="193.32.177.44/24",
"neutron:device_id"="544cddd2-7a53-492a-8933-91fb97fd0546",
"neutron:device_owner"="network:floatingip_agent_gateway",
"neutron:host_id"=us-east-standard-1, "neutron:mtu"="",
"neutron:network_name"=neutron-7dce255f-4824-4a21-a550-f8d03a25c285,
"neutron:port_capabilities"="", "neutron:port_name"="",
"neutron:project_id"="", "neutron:revision_number"="3",
"neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
"neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
--
addresses           : [unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids        : {"neutron:cidrs"="12.26.0.2/16",
"neutron:device_id"=dhcp8b62a377-0e4b-5497-b096-c08bf79b6c42-
c5db4fec-9c10-4022-835d-7281506d8a7e,
"neutron:device_owner"="network:dhcp", "neutron:host_id"=us-east-
standard-1, "neutron:mtu"="", "neutron:network_name"=neutron-
c5db4fec-9c10-4022-835d-7281506d8a7e, "neutron:port_capabilities"="",
"neutron:port_name"="",
"neutron:project_id"=a3b7099e62ac4fb9b3d548dfaff7aeaf,
"neutron:revision_number"="5", "neutron:security_group_ids"="",
"neutron:subnet_pool_addr_scope4"="",
"neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
--
addresses           : [unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids        : {"neutron:cidrs"="10.10.3.242/24",
"neutron:device_id"="544cddd2-7a53-492a-8933-91fb97fd0546",
"neutron:device_owner"="network:floatingip_agent_gateway",
"neutron:host_id"=us-east-standard-1, "neutron:mtu"="",
"neutron:network_name"=neutron-bb8d0ef6-9b45-4398-86f3-51323a0db2cd,
"neutron:port_capabilities"="", "neutron:port_name"="",
"neutron:project_id"="", "neutron:revision_number"="5",
"neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
"neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
--
addresses           : ["fa:16:3e:6e:27:09 12.26.0.109", unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids        : {"neutron:cidrs"="12.26.0.109/16",
"neutron:device_id"="6e2d75ce-1503-4e40-bc72-ef3adc59d45f",
"neutron:device_owner"="network:router_centralized_snat",
"neutron:host_id"=us-east-standard-1, "neutron:mtu"="",
"neutron:network_name"=neutron-c5db4fec-9c10-4022-835d-7281506d8a7e,
"neutron:port_capabilities"="", "neutron:port_name"="",
"neutron:project_id"="", "neutron:revision_number"="6",
"neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
"neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
--
addresses           : ["fa:16:3e:0c:ac:01 12.26.1.76", unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids        : {"neutron:cidrs"="12.26.1.76/16",
"neutron:device_id"="4d3f7d3d-a637-4e40-8bc3-fda4712a1ada",
"neutron:device_owner"="network:router_centralized_snat",
"neutron:host_id"=us-east-standard-1, "neutron:mtu"="",
"neutron:network_name"=neutron-c5db4fec-9c10-4022-835d-7281506d8a7e,
"neutron:port_capabilities"="", "neutron:port_name"="",
"neutron:project_id"="", "neutron:revision_number"="6",
"neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
"neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
--
addresses           : [unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids        : {"neutron:cidrs"="10.10.3.240/24",
"neutron:device_id"=dhcp8b62a377-0e4b-5497-b096-c08bf79b6c42-
bb8d0ef6-9b45-4398-86f3-51323a0db2cd,
"neutron:device_owner"="network:dhcp", "neutron:host_id"=us-east-
standard-1, "neutron:mtu"="", "neutron:network_name"=neutron-
bb8d0ef6-9b45-4398-86f3-51323a0db2cd, "neutron:port_capabilities"="",
"neutron:port_name"="",
"neutron:project_id"="03d31c9de2ec41c787add9b44aacd3a8",
"neutron:revision_number"="6", "neutron:security_group_ids"="",
"neutron:subnet_pool_addr_scope4"="",
"neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
--
addresses           : [unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : []
external_ids        : {}
--
addresses           : [unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : true
external_ids        : {"neutron:cidrs"="193.32.177.174/24",
"neutron:device_id"="ba1a43e2-5496-4ced-8b8c-9b42c5ddd6f1",
"neutron:device_owner"="network:floatingip_agent_gateway",
"neutron:host_id"=us-east-standard-2, "neutron:mtu"="",
"neutron:network_name"=neutron-7dce255f-4824-4a21-a550-f8d03a25c285,
"neutron:port_capabilities"="", "neutron:port_name"="",
"neutron:project_id"="", "neutron:revision_number"="5",
"neutron:security_group_ids"="", "neutron:subnet_pool_addr_scope4"="",
"neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}
--
addresses           : [unknown]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : []
enabled             : []
external_ids        : {}*

Regards,

Ilia Baikov
[email protected]

02.03.2026 18:23, Dumitru Ceara пишет:
On 3/2/26 1:18 PM, Ilia Baikov wrote:
To keep region stable decided to rollback to 25.09 which has no
split
buf merged and migrated to L3 topology with /32 advertise via BGP.
Just a guess: reaching 2k ports (VMs) in a single logical_switch is
the
reason why ARP flows are being dropped/discarded because of resubmit
limit. What do you think?

Hi Ilia,

Right, the very high number of logical switch ports that are part of
the
MC_FLOOD_L2 OVN multicast group (in this case all your VM ports) is
what's causing issues with broadcast ARP requests:
a. generated by the logical router port
b. generated by VMs attached to the logical switch

I'll try to prepare a test/rfc patch in the next days to see if
changing
the action for some logical flows from flooding on the "MC_FLOOD_L2"
group to flooding on the "MC_UNKNOWN" group makes things work in your
setup.

Before that, can you please share how many of those 2k VM ports have
LSP.ddresses configured to include "unknown"?

Thanks,
Dumitru

Regards,

Ilia Baikov
[email protected]

26.02.2026 16:51, Ilia Baikov пишет:
Hi,

This patches seems to fix DHCP issues but there is cases when
instance
booted, received configuration from the metadata service but don't
have a public connectivity (done through L2 networking).

Which, if the logical switch has a reasonably high number of ports
(maybe around 200) will probably cause the resubmit limit to be
hit
This is the case, public L2 network with around ~2000 running
instances (or ports in terms of LSP).

Are these OVN router port IPs?  Or are they OVN workload IPs?  Or
are
they just IPs owned by some fabric hosts, outside of OVN?
.1 IP from each subnet runs by the border gateway. So instance asks
for .1 to know GW MAC adddress but due to hitting limit instance
receive no response because ARP flow is dropped.

Also, aside from the logs, do you actually see any traffic being
impacted?  I.e., are your workloads able to come up and properly
communicate?
Nope, there is connectivity loss since some instances has no public
connectivity due to ARP issues.


Regards,

Ilia Baikov
[email protected]
26.02.2026 14:31, Dumitru Ceara пишет:
Hi Ilia,

On 2/24/26 3:29 PM, Ilia Baikov wrote:
Just checked openvswitch logs. Resubmit 4096 is actually occurs
even on
25.09.2.
v25.09.2 includes:
https://github.com/ovn-org/ovn/commit/0bb60da

Which should fix the "self-DoS" issues introduced by:
https://github.com/ovn-org/ovn/commit/325c7b2

But that means that in some cases, e.g., for real BUM traffic
or for
GARPs originated by OVN router ports we will try to "flood" the
packet
in the L2 broadcast domain.

Which, if the logical switch has a reasonably high number of ports
(maybe around 200) will probably cause the resubmit limit to be
hit.

In the examples below, I see the packets that cause this are ARP
requests requesting the MAC address of:
- 138.124.72.1
- 83.219.248.109
- 138.124.72.1
- 91.92.46.1

Are these OVN router port IPs?  Or are they OVN workload IPs?  Or
are
they just IPs owned by some fabric hosts, outside of OVN?

Also, aside from the logs, do you actually see any traffic being
impacted?  I.e., are your workloads able to come up and properly
communicate?

Thanks,
Dumitru

Final flow: unchanged
Megaflow:
recirc_id=0,eth,arp,in_port=346,dl_src=fa:16:3e:63:aa:d0
Datapath actions: drop
2026-02-24T14:23:17.457Z|04071|connmgr|INFO|br-int<->unix#4346: 1
flow_mods in the last 0 s (1 adds)
2026-02-24T14:23:34.821Z|00076|ofproto_dpif_xlate(handler24)|
WARN|
Dropped 854 log messages in last 60 seconds (most recently, 0
seconds
ago) due to excessive rate
2026-02-24T14:23:34.821Z|00077|ofproto_dpif_xlate(handler24)|
WARN|
over
4096 resubmit actions on bridge br-int while processing
arp,in_port=4715,vlan_tci=0x0000,dl_src=fa:16:3e:97:65:15,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=138.124.72.142,arp_tpa=138.124.72.1,arp_op=1,arp_sha=fa:16:3e:97:65:15,arp_tha=00:00:00:00:00:00
2026-02-24T14:23:45.464Z|00091|dpif(handler28)|WARN|system@ovs-
system:
execute
ct(commit,zone=163,mark=0/0x41,label=0/0xffff00000000000000000000,nat(src)),154 
failed (Invalid argument) on packet 
tcp,vlan_tci=0x0000,dl_src=0c:86:10:b7:9e:e0,dl_dst=fa:16:3e:69:22:89,nw_src=31.44.82.94,nw_dst=31.169.126.149,nw_tos=32,nw_ecn=0,nw_ttl=57,nw_frag=no,tp_src=51064,tp_dst=443,tcp_flags=syn
 tcp_csum:d7b0
     with metadata
skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0xa3),ct_tuple4(src=31.44.82.94,dst=31.169.126.149,proto=6,tp_src=51064,tp_dst=443),in_port(2)
 mtu 0
2026-02-24T14:23:56.702Z|00072|ofproto_dpif_upcall(handler30)|
WARN|
Dropped 697 log messages in last 60 seconds (most recently, 0
seconds
ago) due to excessive rate
2026-02-24T14:23:56.702Z|00073|ofproto_dpif_upcall(handler30)|
WARN|Flow:
arp,in_port=409,vlan_tci=0x0000,dl_src=fa:16:3e:22:f2:f7,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.145.28.207,arp_tpa=192.145.28.1,arp_op=1,arp_sha=fa:16:3e:22:f2:f7,arp_tha=00:00:00:00:00:00

bridge("br-int")
----------------
     0. priority 0
        drop

Final flow: unchanged
Megaflow:
recirc_id=0,eth,arp,in_port=409,dl_src=fa:16:3e:22:f2:f7
Datapath actions: drop
2026-02-24T14:24:34.891Z|02715|ofproto_dpif_xlate(handler2)|WARN|
Dropped
1059 log messages in last 60 seconds (most recently, 1 seconds
ago) due
to excessive rate
2026-02-24T14:24:34.891Z|02716|ofproto_dpif_xlate(handler2)|
WARN|over
4096 resubmit actions on bridge br-int while processing
arp,in_port=1,vlan_tci=0x0000,dl_src=0c:86:10:b7:9e:e0,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=83.219.248.1,arp_tpa=83.219.248.109,arp_op=1,arp_sha=0c:86:10:b7:9e:e0,arp_tha=00:00:00:00:00:00
2026-02-24T14:24:46.042Z|04072|connmgr|INFO|br-int<->unix#4353: 1
flow_mods in the last 0 s (1 adds)
2026-02-24T14:24:59.041Z|00066|ofproto_dpif_upcall(handler78)|
WARN|
Dropped 662 log messages in last 63 seconds (most recently, 3
seconds
ago) due to excessive rate
2026-02-24T14:24:59.041Z|00067|ofproto_dpif_upcall(handler78)|
WARN|Flow:
arp,in_port=339,vlan_tci=0x0000,dl_src=fa:16:3e:39:60:bb,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=91.92.46.85,arp_tpa=91.92.46.1,arp_op=1,arp_sha=fa:16:3e:39:60:bb,arp_tha=00:00:00:00:00:00

bridge("br-int")
----------------
     0. priority 0
        drop

Final flow: unchanged
Megaflow:
recirc_id=0,eth,arp,in_port=339,dl_src=fa:16:3e:39:60:bb
Datapath actions: drop
2026-02-24T14:25:34.783Z|00067|ofproto_dpif_xlate(handler7)|WARN|
Dropped
952 log messages in last 60 seconds (most recently, 0 seconds
ago)
due
to excessive rate
2026-02-24T14:25:34.783Z|00068|ofproto_dpif_xlate(handler7)|
WARN|over
4096 resubmit actions on bridge br-int while processing
arp,in_port=4812,vlan_tci=0x0000,dl_src=fa:16:3e:68:f7:1b,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=138.124.72.245,arp_tpa=138.124.72.1,arp_op=1,arp_sha=fa:16:3e:68:f7:1b,arp_tha=00:00:00:00:00:00
2026-02-24T14:25:59.094Z|00067|ofproto_dpif_upcall(handler11)|
WARN|
Dropped 720 log messages in last 60 seconds (most recently, 0
seconds
ago) due to excessive rate
2026-02-24T14:25:59.095Z|00068|ofproto_dpif_upcall(handler11)|
WARN|Flow:
arp,in_port=305,vlan_tci=0x0000,dl_src=fa:16:3e:d9:8d:f3,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=91.92.46.188,arp_tpa=91.92.46.1,arp_op=1,arp_sha=fa:16:3e:d9:8d:f3,arp_tha=00:00:00:00:00:00

bridge("br-int")
----------------
     0. priority 0
        drop

Final flow: unchanged
Megaflow:
recirc_id=0,eth,arp,in_port=305,dl_src=fa:16:3e:d9:8d:f3
Datapath actions: drop
2026-02-24T14:26:35.024Z|02717|ofproto_dpif_xlate(handler2)|WARN|
Dropped
937 log messages in last 61 seconds (most recently, 1 seconds
ago)
due
to excessive rate
2026-02-24T14:26:35.024Z|02718|ofproto_dpif_xlate(handler2)|
WARN|over
4096 resubmit actions on bridge br-int while processing
arp,in_port=1,vlan_tci=0x0000,dl_src=0c:86:10:b7:9e:e0,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=104.165.244.1,arp_tpa=104.165.244.146,arp_op=1,arp_sha=0c:86:10:b7:9e:e0,arp_tha=00:00:00:00:00:00
2026-02-24T14:26:59.151Z|00067|ofproto_dpif_upcall(handler67)|
WARN|
Dropped 884 log messages in last 60 seconds (most recently, 0
seconds
ago) due to excessive rate
2026-02-24T14:26:59.151Z|00068|ofproto_dpif_upcall(handler67)|
WARN|Flow:
arp,in_port=380,vlan_tci=0x0000,dl_src=fa:16:3e:f1:5b:e7,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=138.124.72.215,arp_tpa=138.124.72.1,arp_op=1,arp_sha=fa:16:3e:f1:5b:e7,arp_tha=00:00:00:00:00:00

bridge("br-int")
----------------
     0. in_port=380, priority 100, cookie 0x2cfc9def
        set_field:0x90/0xffff->reg13
        set_field:0x3->reg11
        set_field:0x1->reg12
        set_field:0x1->metadata
        set_field:0x1d2->reg14
        set_field:0/0xffff0000->reg13
        resubmit(,8)
     8. metadata=0x1, priority 50, cookie 0x43f4e129
        set_field:0/0x1000->reg10
        resubmit(,73)
        73. arp,reg14=0x1d2,metadata=0x1, priority 95, cookie
0x2cfc9def
                resubmit(,74)
            74. arp,reg14=0x1d2,metadata=0x1, priority 80, cookie
0x2cfc9def
                set_field:0x1000/0x1000->reg10
        move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
         -> NXM_NX_XXREG0[111] is now 0x1
        resubmit(,9)
     9. reg0=0x8000/0x8000,metadata=0x1, priority 50, cookie
0xf4bfe3b3
        drop

Final flow:
arp,reg0=0x8000,reg10=0x1000,reg11=0x3,reg12=0x1,reg13=0x90,reg14=0x1d2,metadata=0x1,in_port=380,vlan_tci=0x0000,dl_src=fa:16:3e:f1:5b:e7,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=138.124.72.215,arp_tpa=138.124.72.1,arp_op=1,arp_sha=fa:16:3e:f1:5b:e7,arp_tha=00:00:00:00:00:00
Megaflow:
recirc_id=0,eth,arp,in_port=380,dl_src=fa:16:3e:f1:5b:e7
Datapath actions: drop



Broadcast arps to all routers is set to false.
_uuid               : 1841d88f-3fbf-427f-8d6c-c3edaba47a0a
acls                : []
copp                : []
dns_records         : []
external_ids        : {"neutron:availability_zone_hints"="",
"neutron:mtu"="1500", "neutron:network_name"=poland-public,
"neutron:provnet-network-type"=vlan,
"neutron:revision_number"="12"}
forwarding_groups   : []
load_balancer       : []
load_balancer_group : []
name                : neutron-da85395e-c326-489d-b4e6-
dfb62aad360d
other_config        : {broadcast-arps-to-all-routers="false",
fdb_age_threshold="0", mcast_flood_unregistered="false",
mcast_snoop="false", vlan-passthru="false"}
ports               : [00288a04-90a4-4e8e-bada-8213747c92e4,
0047d609-
ebff-4c43-8f1d-32d83d70c9e6, 00b6c585-ae29-4e88-a52a-3a16e1d91112


Regards,

Ilia Baikov
[email protected]

24.02.2026 17:16, Ilia Baikov пишет:
Hello,
After ugprading to OpenStack 2025.2 with OVN 25.09.2 (which
contains
split buf fix) seems like no issues with DHCP, but I see a
lot of
missed ARP, VM unable to reach GW and there is no ARP
broadcasted to
some of VMs. Debugging shows that ovn installs drop arp flows
for
some
reason.

ovs-appctl ofproto/trace br-int \
"in_port=2,dl_vlan=1000,dl_src=0c:86:10:b7:9e:e0,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0806,arp_op=1,arp_spa=192.145.28.1,arp_tpa=192.145.28.113"
 2>&1 | tail -80
Flow:
arp,in_port=2,dl_vlan=1000,dl_vlan_pcp=0,vlan_tci1=0x0000,dl_src=0c:86:10:b7:9e:e0,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.145.28.1,arp_tpa=192.145.28.113,arp_op=1,arp_sha=00:00:00:00:00:00,arp_tha=00:00:00:00:00:00

bridge("br-int")
----------------
     0. in_port=2, priority 100
        move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23]
         -> OXM_OF_METADATA[0..23] is now 0
        move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14]
         -> NXM_NX_REG14[0..14] is now 0
        move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15]
         -> NXM_NX_REG15[0..15] is now 0
        resubmit(,45)
45. priority 0
        drop

Final flow: unchanged
Megaflow: recirc_id=0,eth,arp,in_port=2,dl_src=0c:86:10:b7:9e:e0
Datapath actions: drop

docker exec ovn_controller ovn-controller --version
ovn-controller 25.09.2
Open vSwitch Library 3.6.2
OpenFlow versions 0x6:0x6
SB DB Schema 21.5.0

ovn-controller logs shows no errors clearly:
2026-02-24T14:06:39.403Z|00001|vlog|INFO|opened log file /
var/log/
kolla/openvswitch/ovn-controller.log
2026-02-24T14:06:39.406Z|00002|reconnect|INFO|
tcp:127.0.0.1:6640:
connecting...
2026-02-24T14:06:39.406Z|00003|reconnect|INFO|
tcp:127.0.0.1:6640:
connected
2026-02-24T14:06:39.463Z|00004|main|INFO|OVN internal version
is :
[25.09.2-21.5.0-81.10]
2026-02-24T14:06:39.463Z|00005|main|INFO|OVS IDL reconnected,
force
recompute.
2026-02-24T14:06:39.464Z|00006|reconnect|INFO|
tcp:10.11.0.4:16641:
connecting...
2026-02-24T14:06:39.464Z|00007|main|INFO|OVNSB IDL reconnected,
force
recompute.
2026-02-24T14:06:39.464Z|00008|reconnect|INFO|
tcp:10.11.0.4:16641:
connected
2026-02-24T14:06:39.464Z|00001|rconn(ovn_statctrl3)|INFO|unix:/
var/
run/openvswitch/br-int.mgmt: connected
2026-02-24T14:06:39.464Z|00001|rconn(ovn_pinctrl0)|INFO|unix:/
var/run/
openvswitch/br-int.mgmt: connected
2026-02-24T14:06:39.529Z|00009|main|INFO|OVS feature set
changed,
force recompute.
2026-02-24T14:06:39.532Z|00010|rconn|INFO|unix:/var/run/
openvswitch/
br-int.mgmt: connected
2026-02-24T14:06:39.532Z|00011|main|INFO|OVS OpenFlow connection
reconnected,force recompute.
2026-02-24T14:06:39.536Z|00012|main|INFO|OVS feature set
changed,
force recompute.
2026-02-24T14:06:40.564Z|00013|main|INFO|OVS feature set
changed,
force recompute.
2026-02-24T14:06:45.920Z|00014|binding|INFO|Releasing lport
bcd3ecfa-
f43c-4e72-8978-73bbad07ed75 from this chassis (sb_readonly=1)
2026-02-24T14:06:45.924Z|00015|binding|INFO|Releasing lport
4f1f45b0-726c-4fea-b462-06dcbf559c25 from this chassis
(sb_readonly=1)
2026-02-24T14:06:46.927Z|00016|timeval|WARN|Unreasonably long
1413ms
poll interval (1294ms user, 117ms system)
2026-02-24T14:06:46.927Z|00017|timeval|WARN|faults: 38131 minor,
0 major
2026-02-24T14:06:46.927Z|00018|timeval|WARN|disk: 0 reads, 8
writes
2026-02-24T14:06:46.927Z|00019|timeval|WARN|context switches: 0
voluntary, 65 involuntary
2026-02-24T14:06:46.936Z|00020|coverage|INFO|Event coverage, avg
rate
over last: 5 seconds, last minute, last hour,  hash=1a815819:
2026-02-24T14:06:46.936Z|00021|coverage|INFO|physical_run
  0.2/sec
     0.017/sec        0.0003/sec   total: 1
2026-02-24T14:06:46.936Z|00022|coverage|INFO|lflow_conj_alloc
   0.0/
sec     0.000/sec        0.0000/sec   total: 407
2026-02-24T14:06:46.936Z|00023|coverage|INFO|lflow_cache_miss
   0.0/
sec     0.000/sec        0.0000/sec   total: 13470
2026-02-24T14:06:46.936Z|00024|coverage|INFO|lflow_cache_hit
0.0/sec
       0.000/sec        0.0000/sec   total: 394
2026-02-24T14:06:46.936Z|00025|coverage|INFO|lflow_cache_add
0.0/sec
       0.000/sec        0.0000/sec   total: 12956
2026-02-24T14:06:46.936Z|00026|coverage|INFO|
lflow_cache_add_matches
0.0/sec     0.000/sec        0.0000/sec   total: 2412
2026-02-24T14:06:46.936Z|00027|coverage|INFO|
lflow_cache_add_expr
     0.0/sec     0.000/sec        0.0000/sec   total: 10544
2026-02-24T14:06:46.936Z|00028|coverage|INFO|
consider_logical_flow
0.0/sec     0.000/sec        0.0000/sec   total: 20680
2026-02-24T14:06:46.936Z|00029|coverage|INFO|lflow_run 0.2/sec
     0.017/sec        0.0003/sec   total: 1
2026-02-24T14:06:46.936Z|00030|coverage|INFO|miniflow_malloc
   16.6/
sec     1.383/sec        0.0231/sec   total: 28561
2026-02-24T14:06:46.936Z|00031|coverage|INFO|hmap_pathological
    11.2/
sec     0.933/sec        0.0156/sec   total: 257
2026-02-24T14:06:46.936Z|00032|coverage|INFO|hmap_expand
837.2/sec
69.767/sec        1.1628/sec   total: 30358
2026-02-24T14:06:46.936Z|00033|coverage|INFO|hmap_reserve
  0.4/sec
     0.033/sec        0.0006/sec   total: 21733
2026-02-24T14:06:46.936Z|00034|coverage|INFO|txn_unchanged
2.4/sec
     0.200/sec        0.0033/sec   total: 65
2026-02-24T14:06:46.936Z|00035|coverage|INFO|txn_incomplete
   1.4/sec
       0.117/sec        0.0019/sec   total: 60
2026-02-24T14:06:46.936Z|00036|coverage|INFO|txn_success 0.6/sec
     0.050/sec        0.0008/sec   total: 3
2026-02-24T14:06:46.936Z|00037|coverage|INFO|poll_create_node
24.0/
sec     2.000/sec        0.0333/sec   total: 1304
2026-02-24T14:06:46.937Z|00038|coverage|INFO|poll_zero_timeout
0.0/
sec     0.000/sec        0.0000/sec   total: 1
2026-02-24T14:06:46.937Z|00039|coverage|INFO|rconn_queued
  0.8/sec
     0.067/sec        0.0011/sec   total: 4
2026-02-24T14:06:46.937Z|00040|coverage|INFO|rconn_sent  0.8/sec
     0.067/sec        0.0011/sec   total: 4
2026-02-24T14:06:46.937Z|00041|coverage|INFO|seq_change  9.2/sec
     0.767/sec        0.0128/sec   total: 532
2026-02-24T14:06:46.937Z|00042|coverage|INFO|pstream_open
  0.2/sec
     0.017/sec        0.0003/sec   total: 1
2026-02-24T14:06:46.937Z|00043|coverage|INFO|stream_open 1.2/sec
     0.100/sec        0.0017/sec   total: 6
2026-02-24T14:06:46.937Z|00044|coverage|INFO|util_xalloc
29035.4/sec
2419.617/sec       40.3269/sec   total: 2277081
2026-02-24T14:06:46.937Z|00045|coverage|INFO|vconn_received
   0.8/sec
       0.067/sec        0.0011/sec   total: 4
2026-02-24T14:06:46.937Z|00046|coverage|INFO|vconn_sent  1.2/sec
     0.100/sec        0.0017/sec   total: 6
2026-02-24T14:06:46.937Z|00047|coverage|INFO|
jsonrpc_recv_incomplete
0.6/sec     0.050/sec        0.0008/sec   total: 52
2026-02-24T14:06:46.937Z|00048|coverage|INFO|138 events never
hit
2026-02-24T14:06:46.976Z|00049|binding|INFO|Releasing lport
4f1f45b0-726c-4fea-b462-06dcbf559c25 from this chassis
(sb_readonly=0)
2026-02-24T14:06:46.977Z|00050|binding|INFO|Releasing lport
bcd3ecfa-
f43c-4e72-8978-73bbad07ed75 from this chassis (sb_readonly=0)
2026-02-24T14:06:48.054Z|00051|timeval|WARN|Unreasonably long
1117ms
poll interval (1108ms user, 8ms system)
2026-02-24T14:06:48.054Z|00052|timeval|WARN|faults: 2581
minor, 0
major
2026-02-24T14:06:48.054Z|00053|timeval|WARN|context switches: 0
voluntary, 8 involuntary
2026-02-24T14:06:48.055Z|00054|coverage|INFO|Event coverage, avg
rate
over last: 5 seconds, last minute, last hour,  hash=0878340f:
2026-02-24T14:06:48.055Z|00055|coverage|INFO|physical_run
  0.2/sec
     0.017/sec        0.0003/sec   total: 2
2026-02-24T14:06:48.055Z|00056|coverage|INFO|lflow_conj_alloc
   0.0/
sec     0.000/sec        0.0000/sec   total: 814
2026-02-24T14:06:48.055Z|00057|coverage|INFO|lflow_cache_miss
   0.0/
sec     0.000/sec        0.0000/sec   total: 13979
2026-02-24T14:06:48.055Z|00058|coverage|INFO|lflow_cache_hit
0.0/sec
       0.000/sec        0.0000/sec   total: 13671
2026-02-24T14:06:48.055Z|00059|coverage|INFO|lflow_cache_add
0.0/sec
       0.000/sec        0.0000/sec   total: 12956
2026-02-24T14:06:48.055Z|00060|coverage|INFO|
lflow_cache_add_matches
0.0/sec     0.000/sec        0.0000/sec   total: 2412
2026-02-24T14:06:48.055Z|00061|coverage|INFO|
lflow_cache_add_expr
     0.0/sec     0.000/sec        0.0000/sec   total: 10544
2026-02-24T14:06:48.055Z|00062|coverage|INFO|
consider_logical_flow
0.0/sec     0.000/sec        0.0000/sec   total: 41360
2026-02-24T14:06:48.055Z|00063|coverage|INFO|lflow_run 0.2/sec
     0.017/sec        0.0003/sec   total: 2
2026-02-24T14:06:48.055Z|00064|coverage|INFO|cmap_expand 0.0/sec
     0.000/sec        0.0000/sec   total: 7
2026-02-24T14:06:48.055Z|00065|coverage|INFO|miniflow_malloc
   16.6/
sec     1.383/sec        0.0231/sec   total: 63156
2026-02-24T14:06:48.055Z|00066|coverage|INFO|hmap_pathological
    11.2/
sec     0.933/sec        0.0156/sec   total: 311
2026-02-24T14:06:48.056Z|00067|coverage|INFO|hmap_expand
837.2/sec
69.767/sec        1.1628/sec   total: 30539
2026-02-24T14:06:48.056Z|00068|coverage|INFO|hmap_reserve
  0.4/sec
     0.033/sec        0.0006/sec   total: 22553
2026-02-24T14:06:48.056Z|00069|coverage|INFO|txn_unchanged
2.4/sec
     0.200/sec        0.0033/sec   total: 67
2026-02-24T14:06:48.056Z|00070|coverage|INFO|txn_incomplete
   1.4/sec
       0.117/sec        0.0019/sec   total: 60
2026-02-24T14:06:48.056Z|00071|coverage|INFO|txn_success 0.6/sec
     0.050/sec        0.0008/sec   total: 4
2026-02-24T14:06:48.056Z|00072|coverage|INFO|poll_create_node
24.0/
sec     2.000/sec        0.0333/sec   total: 1335
2026-02-24T14:06:48.056Z|00073|coverage|INFO|poll_zero_timeout
0.0/
sec     0.000/sec        0.0000/sec   total: 1
2026-02-24T14:06:48.056Z|00074|coverage|INFO|rconn_queued
  0.8/sec
     0.067/sec        0.0011/sec   total: 4
2026-02-24T14:06:48.056Z|00075|coverage|INFO|rconn_sent  0.8/sec
     0.067/sec        0.0011/sec   total: 4
2026-02-24T14:06:48.056Z|00076|coverage|INFO|seq_change  9.2/sec
     0.767/sec        0.0128/sec   total: 546
2026-02-24T14:06:48.056Z|00077|coverage|INFO|pstream_open
  0.2/sec
     0.017/sec        0.0003/sec   total: 1
2026-02-24T14:06:48.056Z|00078|coverage|INFO|stream_open 1.2/sec
     0.100/sec        0.0017/sec   total: 6
2026-02-24T14:06:48.056Z|00079|coverage|INFO|long_poll_interval
     0.0/sec     0.000/sec        0.0000/sec   total: 1
2026-02-24T14:06:48.056Z|00080|coverage|INFO|util_xalloc
29035.4/sec
2419.617/sec       40.3269/sec   total: 2477649
2026-02-24T14:06:48.056Z|00081|coverage|INFO|vconn_received
   0.8/sec
       0.067/sec        0.0011/sec   total: 4
2026-02-24T14:06:48.056Z|00082|coverage|INFO|vconn_sent  1.2/sec
     0.100/sec        0.0017/sec   total: 6
2026-02-24T14:06:48.056Z|00083|coverage|INFO|
jsonrpc_recv_incomplete
0.6/sec     0.050/sec        0.0008/sec   total: 52
2026-02-24T14:06:48.056Z|00084|coverage|INFO|136 events never
hit
2026-02-24T14:06:48.056Z|00085|poll_loop|INFO|wakeup due to
[POLLIN]
on fd 29 (10.11.0.2:40496<->10.11.0.4:16641) at lib/stream-
fd.c:157
(82% CPU usage)
2026-02-24T14:06:48.097Z|00086|poll_loop|INFO|wakeup due to
[POLLIN]
on fd 29 (10.11.0.2:40496<->10.11.0.4:16641) at lib/stream-
fd.c:157
(82% CPU usage)
2026-02-24T14:06:48.104Z|00087|poll_loop|INFO|wakeup due to 0-ms
timeout at controller/ovn-controller.c:7558 (82% CPU usage)
2026-02-24T14:06:48.283Z|00088|poll_loop|INFO|wakeup due to 0-ms
timeout at controller/ofctrl.c:692 (82% CPU usage)
2026-02-24T14:06:48.870Z|00089|poll_loop|INFO|wakeup due to
[POLLIN]
on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
fd.c:153
(82% CPU usage)
2026-02-24T14:06:48.877Z|00090|poll_loop|INFO|wakeup due to
[POLLOUT]
on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
fd.c:153
(82% CPU usage)
2026-02-24T14:06:48.884Z|00091|poll_loop|INFO|wakeup due to
[POLLOUT]
on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
fd.c:153
(82% CPU usage)
2026-02-24T14:06:48.892Z|00092|poll_loop|INFO|wakeup due to
[POLLOUT]
on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
fd.c:153
(82% CPU usage)
2026-02-24T14:06:48.900Z|00093|poll_loop|INFO|wakeup due to
[POLLOUT]
on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
fd.c:153
(82% CPU usage)
2026-02-24T14:06:48.907Z|00094|poll_loop|INFO|wakeup due to
[POLLOUT]
on fd 33 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
fd.c:153
(82% CPU usage)
2026-02-24T14:06:49.875Z|00095|memory|INFO|143124 kB peak
resident set
size after 10.5 seconds
2026-02-24T14:06:49.875Z|00096|memory|INFO|idl-cells-
OVN_Southbound:301305 idl-cells-Open_vSwitch:25815 lflow-cache-
entries-cache-expr:10548 lflow-cache-entries-cache-matches:2413
lflow-
cache-size-KB:32447 local_datapath_usage-KB:2
ofctrl_desired_flow_usage-KB:8528 ofctrl_installed_flow_usage-
KB:6365
ofctrl_rconn_packet_counter-KB:5161 ofctrl_sb_flow_ref_usage-
KB:3196
oflow_update_usage-KB:1

Regards,

Ilia Baikov
[email protected]

12.02.2026 21:22, Ilia Baikov пишет:
Hi,
Returning back to this issue after a while as I'm migrating
ml2/ovs
to ml2/ovn.
Seems like the same issue from 2025 still persists.

refs:
[0]https://mail.openvswitch.org/pipermail/ovs-discuss/2025-
February/053456.html
[1]https://mail.openvswitch.org/pipermail/ovs-discuss/2025-
March/053484.html

case:
Big L2 domain with border device learning IPs by flooding ARP.
For
some reason in case L2 device (nic with VLANs attached) is
attached
to the br-ex bridge, then after a while OVN stops sending DHCP
packets (OFFER/ACK/etc).

Anybody else observed the same issue? The only way to stabilize
region is to switch to L3 networking using ovn-bgp-agent
(eth0 is
detached from br-ex, so no more ARPs delivered to ovn-
controller),
but there is monstrous overhead using kernel routing: IRQ
higher up
to x5-6 times, like 10-12% while L2 networking is just 2% which
is fine.

Meanwhile, no errors, warns, resubmit logs in logs.
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to