** Description changed: [Impact] Netlink calls to the kernel can return more than 16k bytes (they can return 32k on newer kernels). The pyroute2 library has a default buffer size of 16k and fails to read the data when kernel response data overflows this. One example of where users encounter this is booting OpenStack instances with SRIOV when there are more than 32 VFs, as seen in the original problem description (included below). [Test Case] Use an SRIOV capable card and enable more than 32 VFs on a modern kernel. Attempt to launch an instance using OpenStack as follows: 1. Create example network: $ juju switch openstack $ source ~/deploy/novarc $ openstack network create \ --provider-physical-network sriovfabric \ --provider-segment 300 \ --provider-network-type vlan \ test-sriov $ openstack subnet create --network test-sriov \ --no-dhcp \ --gateway none \ --subnet-range 192.168.1.0/24 test-sriov 2. Create ports over virtual function: $ juju switch openstack $ source ~/deploy/novarc $ openstack port create \ --network test-sriov \ --vnic-type direct \ sriov-vf1 $ openstack server create \ --image bionic-kvm \ --flavor m1.small \ --network ext-net-300 \ --port sriov-vf1 \ --key-name ubuntu-keypair \ --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \ sriov-vf1 3. The instance stalls in build state (virsh list shows paused VM) and drops to ERROR [Where problems could occur] Problems may occur in existing customers already using openstack to schedule SRIOV instances and may show up as failure to build instances. Additional problems could include the increased memory usage of the nova processes which occurs by increasing the default buffer size. For tightly spec'd systems with small memory allocated to the host, this could further eat into any margin available and push memory usage over the edge. - - [Previous Description] - - # Problem Description - Attempt to boot instance with SR-IOV interface fails. Instance stays in BUILD stage for ca 1 minute and then turns to ERROR state. Neutron agent log shows: - - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Error in agent loop. Devices info: {}: TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most recent call last): - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 473, in daemon_loop - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent device_info = self.scan_devices(devices, updated_devices_copy) - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent result = f(*args, **kwargs) - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py", line 243, in scan_devices - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent curr_devices = self.eswitch_mgr.get_assigned_devices_info() - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 344, in get_assigned_devices_info - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent for device in embedded_switch.get_assigned_devices_info(): - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 186, in get_assigned_devices_info - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent mac = self.get_pci_device(pci_slot) - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py", line 297, in get_pci_device - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent macs = self.pci_dev_wrapper.get_assigned_macs([vf_index]) - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py", line 46, in get_assigned_macs - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent vfs = ip.link.get_vfs() - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 516, in get_vfs - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return privileged.get_link_vfs(self.name, self._parent.namespace) - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/priv_context.py", line 247, in _wrap - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent return self.channel.remote_call(name, args, kwargs) - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent File "/usr/lib/python3/dist-packages/oslo_privsep/daemon.py", line 204, in remote_call - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent raise exc_type(*result[2]) - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: Cannot serialize error('unpack_from requires a buffer of at least 4 bytes',) - 2020-11-18 10:54:58.927 53769 ERROR neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent - 2020-11-18 10:55:00.885 53769 INFO neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent [req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Agent out of sync with plugin! - 2020-11-18 10:55:02.244 53769 INFO neutron.agent.securitygroups_rpc [req-f4627e61-2abe-49bc-bc7b-9fa9f66e1f4b 980116faf095432e8ac7887db995aeb3 78ab17a067cf49cabba6c3c5d0faabcc - - -] Security group member updated ['808d2b62-75ba-45d6-969c-87ce90d56c37'] - - # Environment - Openstack USSURI + OVN - ovn-chassis version: cs:~openstack-charmers-next/ovn-chassis-40 - neutron-sriov-agent version 2:16.2.0-0ubuntu1~cloud0 - CIS hardened system. - aa profile set to disable, AppArmor profiles teardown applied. - neutron-sriov-agent reports UP in openstack network agent list. - - charm configuration: - charm: ovn-chassis - settings: - bridge-interface-mappings: - value: br-data:bond1 - debug: - value: false - dpdk-bond-config: - value: :balance-tcp:active:fast - dpdk-bond-mappings: - dpdk-driver: - dpdk-socket-cores: - value: 1 - dpdk-socket-memory: - value: 1024 - enable-dpdk: - value: false - enable-hardware-offload: - value: false - enable-sriov: - value: true - new-units-paused: - value: false - openstack-metadata-workers: - value: 2 - ovn-bridge-mappings: - value: dcfabric:br-data sriovfabric:br-data - sriov-device-mappings: - value: sriovfabric:ens3f0 sriovfabric:ens3f1 - sriov-numvfs: - value: ens3f0:64 ens3f0:64 - - Agent config: - root@cmp4az1cz20300kvs:~# cat /etc/neutron/plugins/ml2/sriov_agent.ini - ############################################################################### - # [ WARNING ] - # Configuration file maintained by Juju. Local changes may be overwritten. - # Config managed by ovn-chassis charm - ############################################################################### - [securitygroup] - firewall_driver = neutron.agent.firewall.NoopFirewallDriver - - [sriov_nic] - physical_device_mappings = sriovfabric:ens3f0,sriovfabric:ens3f1 - exclude_devices = - - root@cmp4az1cz20300kvs:~# cat /etc/neutron/neutron.conf - ############################################################################### - # [ WARNING ] - # Configuration file maintained by Juju. Local changes may be overwritten. - # Config managed by ovn-chassis charm - ############################################################################### - [DEFAULT] - debug = False - host = cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com - core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin - - # This template must be included under the [DEFAULT] section - - transport_url = - rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack - - [oslo_messaging_notifications] - driver = messagingv2 - # This template must be included under the [DEFAULT] section - - transport_url = - rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack - - topics = notifications - [AGENT] - root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.confroot - - # STEPS TO REPRODUCE - - apply environment config as above - - create networking and the instance - Create example network: - $ juju switch openstack - $ source ~/deploy/novarc - $ openstack network create \ - --provider-physical-network sriovfabric \ - --provider-segment 300 \ - --provider-network-type vlan \ - test-sriov - - $ openstack subnet create --network test-sriov \ - --no-dhcp \ - --gateway none \ - --subnet-range 192.168.1.0/24 test-sriov - - Create ports over virtual function: - $ juju switch openstack - $ source ~/deploy/novarc - $ openstack port create \ - --network test-sriov \ - --vnic-type direct \ - sriov-vf1 - - $ openstack server create \ - --image bionic-kvm \ - --flavor m1.small \ - --network ext-net-300 \ - --port sriov-vf1 \ - --key-name ubuntu-keypair \ - --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \ - sriov-vf1 - - - the instance stalls in build state (virsh list shows paused VM) and - drops to ERROR
** Information type changed from Private to Public -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1904730 Title: neutron-agent-sriov fails to create port To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs