** Description changed:

  [Impact]
  
  Netlink calls to the kernel can return more than 16k bytes (they can
  return 32k on newer kernels). The pyroute2 library has a default buffer
  size of 16k and fails to read the data when kernel response data
  overflows this.
  
  One example of where users encounter this is booting OpenStack instances
  with SRIOV when there are more than 32 VFs, as seen in the original
  problem description (included below).
  
  [Test Case]
  
  Use an SRIOV capable card and enable more than 32 VFs on a modern
  kernel. Attempt to launch an instance using OpenStack as follows:
  
  1. Create example network:
  $ juju switch openstack
  $ source ~/deploy/novarc
  $ openstack network create \
  --provider-physical-network sriovfabric \
  --provider-segment 300 \
  --provider-network-type vlan \
  test-sriov
  
  $ openstack subnet create --network test-sriov \
    --no-dhcp \
    --gateway none \
    --subnet-range 192.168.1.0/24 test-sriov
  
  2. Create ports over virtual function:
  $ juju switch openstack
  $ source ~/deploy/novarc
  $ openstack port create \
  --network test-sriov \
  --vnic-type direct \
  sriov-vf1
  
  $ openstack server create \
  --image bionic-kvm \
  --flavor m1.small \
  --network ext-net-300 \
  --port sriov-vf1 \
  --key-name ubuntu-keypair \
  --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \
  sriov-vf1
  
  3. The instance stalls in build state (virsh list shows paused VM) and
  drops to ERROR
  
  [Where problems could occur]
  
  Problems may occur in existing customers already using openstack to
  schedule SRIOV instances and may show up as failure to build instances.
  Additional problems could include the increased memory usage of the nova
  processes which occurs by increasing the default buffer size. For
  tightly spec'd systems with small memory allocated to the host, this
  could further eat into any margin available and push memory usage over
  the edge.
- 
- [Previous Description]
- 
- # Problem Description
- Attempt to boot instance with SR-IOV interface fails. Instance stays in BUILD 
stage for ca 1 minute and then turns to ERROR state. Neutron agent log shows:
- 
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 
[req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Error in agent loop. 
Devices info: {}: TypeError: Cannot serialize error('unpack_from requires a 
buffer of at least 4 bytes',)
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent Traceback (most 
recent call last):
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py",
 line 473, in daemon_loop
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     device_info = 
self.scan_devices(devices, updated_devices_copy)
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/osprofiler/profiler.py", line 160, in wrapper
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     result = 
f(*args, **kwargs)
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py",
 line 243, in scan_devices
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     curr_devices = 
self.eswitch_mgr.get_assigned_devices_info()
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py",
 line 344, in get_assigned_devices_info
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     for device in 
embedded_switch.get_assigned_devices_info():
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py",
 line 186, in get_assigned_devices_info
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     mac = 
self.get_pci_device(pci_slot)
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/eswitch_manager.py",
 line 297, in get_pci_device
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     macs = 
self.pci_dev_wrapper.get_assigned_macs([vf_index])
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/mech_sriov/agent/pci_lib.py",
 line 46, in get_assigned_macs
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     vfs = 
ip.link.get_vfs()
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/neutron/agent/linux/ip_lib.py", line 516, in 
get_vfs
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return 
privileged.get_link_vfs(self.name, self._parent.namespace)
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/oslo_privsep/priv_context.py", line 247, in 
_wrap
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     return 
self.channel.remote_call(name, args, kwargs)
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent   File 
"/usr/lib/python3/dist-packages/oslo_privsep/daemon.py", line 204, in 
remote_call
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent     raise 
exc_type(*result[2])
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent TypeError: Cannot 
serialize error('unpack_from requires a buffer of at least 4 bytes',)
- 2020-11-18 10:54:58.927 53769 ERROR 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent
- 2020-11-18 10:55:00.885 53769 INFO 
neutron.plugins.ml2.drivers.mech_sriov.agent.sriov_nic_agent 
[req-f116d427-838a-4366-8173-801fbe84e406 - - - - -] Agent out of sync with 
plugin!
- 2020-11-18 10:55:02.244 53769 INFO neutron.agent.securitygroups_rpc 
[req-f4627e61-2abe-49bc-bc7b-9fa9f66e1f4b 980116faf095432e8ac7887db995aeb3 
78ab17a067cf49cabba6c3c5d0faabcc - - -] Security group member updated 
['808d2b62-75ba-45d6-969c-87ce90d56c37']
- 
- # Environment
- Openstack USSURI + OVN
- ovn-chassis version: cs:~openstack-charmers-next/ovn-chassis-40
- neutron-sriov-agent version 2:16.2.0-0ubuntu1~cloud0
- CIS hardened system.
- aa profile set to disable, AppArmor profiles teardown applied.
- neutron-sriov-agent reports UP in openstack network agent list.
- 
- charm configuration:
- charm: ovn-chassis
- settings:
-   bridge-interface-mappings:
-     value: br-data:bond1
-   debug:
-     value: false
-   dpdk-bond-config:
-     value: :balance-tcp:active:fast
-   dpdk-bond-mappings:
-   dpdk-driver:
-   dpdk-socket-cores:
-     value: 1
-   dpdk-socket-memory:
-     value: 1024
-   enable-dpdk:
-     value: false
-   enable-hardware-offload:
-     value: false
-   enable-sriov:
-     value: true
-   new-units-paused:
-     value: false
-   openstack-metadata-workers:
-     value: 2
-   ovn-bridge-mappings:
-     value: dcfabric:br-data sriovfabric:br-data
-   sriov-device-mappings:
-     value: sriovfabric:ens3f0 sriovfabric:ens3f1
-   sriov-numvfs:
-     value: ens3f0:64 ens3f0:64
- 
- Agent config:
- root@cmp4az1cz20300kvs:~# cat /etc/neutron/plugins/ml2/sriov_agent.ini
- 
###############################################################################
- # [ WARNING ]
- # Configuration file maintained by Juju. Local changes may be overwritten.
- # Config managed by ovn-chassis charm
- 
###############################################################################
- [securitygroup]
- firewall_driver = neutron.agent.firewall.NoopFirewallDriver
- 
- [sriov_nic]
- physical_device_mappings = sriovfabric:ens3f0,sriovfabric:ens3f1
- exclude_devices =
- 
- root@cmp4az1cz20300kvs:~# cat /etc/neutron/neutron.conf
- 
###############################################################################
- # [ WARNING ]
- # Configuration file maintained by Juju. Local changes may be overwritten.
- # Config managed by ovn-chassis charm
- 
###############################################################################
- [DEFAULT]
- debug = False
- host = cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com
- core_plugin = neutron.plugins.ml2.plugin.Ml2Plugin
- 
- # This template must be included under the [DEFAULT] section
- 
- transport_url =
- 
rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack
- 
- [oslo_messaging_notifications]
- driver = messagingv2
- # This template must be included under the [DEFAULT] section
- 
- transport_url =
- 
rabbit://neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.12:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.243:5672,neutron:xRCSgmHSJNSVBcSCHk7wydJ6hSgjJsmnJs2N6y9tCjWPbpdgqJCHpFpCtx8VBPgp@10.216.245.44:5672/openstack
- 
- topics = notifications
- [AGENT]
- root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.confroot
- 
- # STEPS TO REPRODUCE
- - apply environment config as above
- - create networking and the instance
- Create example network:
- $ juju switch openstack
- $ source ~/deploy/novarc
- $ openstack network create \
- --provider-physical-network sriovfabric \
- --provider-segment 300 \
- --provider-network-type vlan \
- test-sriov
- 
- $ openstack subnet create --network test-sriov \
-   --no-dhcp \
-   --gateway none \
-   --subnet-range 192.168.1.0/24 test-sriov
- 
- Create ports over virtual function:
- $ juju switch openstack
- $ source ~/deploy/novarc
- $ openstack port create \
- --network test-sriov \
- --vnic-type direct \
- sriov-vf1
- 
- $ openstack server create \
- --image bionic-kvm \
- --flavor m1.small \
- --network ext-net-300 \
- --port sriov-vf1 \
- --key-name ubuntu-keypair \
- --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \
- sriov-vf1
- 
- - the instance stalls in build state (virsh list shows paused VM) and
- drops to ERROR

** Information type changed from Private to Public

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1904730

Title:
  neutron-agent-sriov fails to create port

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pyroute2/+bug/1904730/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to