On Thu, 2016-08-04 at 16:34 +0300, Nikolay Borisov wrote: > > On 08/01/2016 11:56 AM, Erez Shitrit wrote: > > > > The GID (9000:0:2800:0:bc00:7500:6e:d8a4) is not regular, not from > > local subnet prefix. > > why is that? > > > > So I managed to debug this and it tuns out the problem lies between > veth > and ipoib interaction: > > I've discovered the following strange thing. If I have a vethpair > where > the 2 devices are in a different net namespaces as shown in the > scripts > I have attached then the performance of sending a file, originating > from > the veth interface inside the non-init netnamespace, going across the > ipoib interface is very slow (100kb). For simple reproduction I'm > attaching > 2 scripts which have to be run on 2 machine and the respective ip > addresses > set on them. Then sending node woult initiate a simple file copy over > NC. > I've observed this behavior on upstream 4.4, 4.5.4 and 4.7.0 kernels > both > with ipv4 and ipv6 addresses. Here is what the debug log of the ipoib > module shows: > > ib%d: max_srq_sge=128 > ib%d: max_cm_mtu = 0xfff0, num_frags=16 > ib0: enabling connected mode will cause multicast packet drops > ib0: mtu > 4092 will cause multicast packet drops. > ib0: bringing up interface > ib0: starting multicast thread > ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff > ib0: restarting multicast task > ib0: adding multicast entry for mgid > ff12:601b:ffff:0000:0000:0000:0000:0001 > ib0: restarting multicast task > ib0: adding multicast entry for mgid > ff12:401b:ffff:0000:0000:0000:0000:0001 > ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff > (status 0) > ib0: Created ah ffff88081063ea80 > ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV > ffff88081063ea80, LID 0xc000, SL 0 > ib0: joining MGID ff12:601b:ffff:0000:0000:0000:0000:0001 > ib0: joining MGID ff12:401b:ffff:0000:0000:0000:0000:0001 > ib0: successfully started all multicast joins > ib0: join completion for ff12:601b:ffff:0000:0000:0000:0000:0001 > (status 0) > ib0: Created ah ffff880839084680 > ib0: MGID ff12:601b:ffff:0000:0000:0000:0000:0001 AV > ffff880839084680, LID 0xc002, SL 0 > ib0: join completion for ff12:401b:ffff:0000:0000:0000:0000:0001 > (status 0) > ib0: Created ah ffff88081063e280 > ib0: MGID ff12:401b:ffff:0000:0000:0000:0000:0001 AV > ffff88081063e280, LID 0xc004, SL 0 > > When the transfer is initiated I can see the following errors > on the sending node: > > ib0: PathRec status -22 for GID > 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: neigh free for 000003 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: Start path record lookup for > 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: PathRec status -22 for GID > 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: neigh free for 000003 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: Start path record lookup for > 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: PathRec status -22 for GID > 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: neigh free for 000003 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: Start path record lookup for > 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: PathRec status -22 for GID > 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: Start path record lookup for > 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: PathRec status -22 for GID > 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: neigh free for 000003 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > ib0: neigh free for 000003 0401:0000:1400:0000:a0a8:ffff:1c01:4d36 > > Here is the port guid of the sending node: 0x0011750000772664 and > on the receiving one: 0x0011750000774d36 > > Here is how the paths look like on the sending node, > clearly the paths being requested from the veth interface > > cat /sys/kernel/debug/ipoib/ib0_path > GID: 401:0:1400:0:a0a8:ffff:1c01:4d36 > complete: no > > GID: 401:0:1400:0:a410:ffff:1c01:4d36 > complete: no > > GID: fe80:0:0:0:11:7500:77:2a1a > complete: yes > DLID: 0x0004 > SL: 0 > rate: 40.0 Gb/sec > > GID: fe80:0:0:0:11:7500:77:4d36 > complete: yes > DLID: 0x000a > SL: 0 > rate: 40.0 Gb/sec > > Testing the same scenario but instead of using veth devices I create > the device in the non-init netnamespace via the following commands > I can achieve sensible speeds: > ip link add link ib0 name ip1 type ipoib > ip link set dev ip1 netns test-netnamespace > > > > > > [Snipped a lot of useless stuff]
The poor performance sounds a duplicate of the issue reported by Roland and in the upstream kernel bugzilla 111921. That would be the IPoIB routed packet performance issue. -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD
signature.asc
Description: This is a digitally signed message part