Could be interesting to find out whether on a m1.small the issue does not occur 
(although that still could be resulting from other differences in the setup 
than mtu). Not sure how AWS manages to cause the instance to come up with a 
different mtu either. In my experiments I had a normal bridge on the host set 
to 9000 and the guest still had 1500. Though I do not know how the network is 
set up in EC2 in detail (could be openvswitch).
Generally the issue is that something seems to cause packets with a large data 
buffer. One slot in the xen-net driver is a 4k page. The limit is 18 slots. 
Anything above that causes the observed message and the packet to be dropped. 
The host side would have another limit of (usually) 20 slots on which it would 
assume a malicious guest and disrupts the connection. But since the guest drops 
at 17 or above the host should never see that number.
Unfortunately I am not that deeply understanding the network code, so I will 
have to ask upstream. As far as I understand a socket buffer can consist of of 
multiple fragments (kind of a scatter gather list). There is a definition in 
the code that sets a limit to the number of fragments based on a maximum frame 
size of 64K. This results in 17 frags (for 4K pages that is 16 + 1 to handle 
data not starting at page boundary). The Xen driver counts the length of the 
memory area in all frags (if data in a frag starts at an offset that is added, 
the code does that for every frag, the question would be whether in theory each 
frag would be allowed to have an offset because that might add up to more than 
one page). To the number of pages needed for the frags, the driver then adds 
the number of pages (can that be more than one?) needed for the header. If that 
is bigger than 18 (17 for frag + 1 for header?) the rides the rocket error 
happens.
This leaves a few question marks for me: the memory associated with a frag can 
be a compound page, so I would think that the length might be greater than 4K. 
I have no clue, yet, how compound pages exactly come into play. Is the 64K 
limit still enforced by a limit of the number of frags? Can each frag data 
begin at some offset (and end with more than one page of overall overhead)? 
Apparently the header can start at some offset, too. So worst case (assuming 
header length to be less than 4K), if the offset is quite big, that could end 
up requiring 2 pages. Then if the frag data happens to use up its 17 pages 
limit, we just would end up hitting the 19 pages failure size.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1317811

Title:
  Dropped packets on EC2, "xen_netfront: xennet: skb rides the rocket: x
  slots"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to