Netchannel [1] is pure bridge between low-level hardware and user, without any
special protocol processing involved between them.
Users are not limited to userspace only - I will use this netchannel
infrastructure for fast NAT implementation, which is purely kernelspace user 
(although it is possible to create NAT in userspace, but price of the 
kernelspace board crossing is too high, which only needs to change some fields 
in the header and recalculate checksum).
Userspace network stack [2] is another user of the new netchannel subsystem.

Current netchannel version supports data transfer using copy*user().

One could ask how does it differ from netfilter's queue target?
There are three differencies (read advantages):

 * it does not depend on netfilter (and thus does not introduce it's slow path)
 * it is very scalable, since it does not use neither hash tables, nor lists
 * it does not depend on netfilter (and thus does not introduce it's slow 
path). 
        Yes, again, since if we get into account NAT implementation, then we 
        need to add dependency on connection tracking, which is not needed for
        existing netchannels implementation.

It is also much smaller and scalable compared to tun/tap devices.

And some other small advantages: possibility to perform zero-copy sending and 
receiving using network allocator's [3] facilities (not implemented in the 
current 
version of netchannels), it is very small, there are no locks in the very short 
fast path (except RCU and skb queue linking lock, which is held for 5
operations) and so on...

There are also some limitations: it is only possible to get one packet per read 
from netchannel's file descriptor (it is possible to extend it to read several 
packets, but right now I leave it as is), it is ipv4 only (I'm lazy and only
implemented tree comparison functions for IPv4 addresses).

First user of the netchannel subsystem is userspace network stack [2], which
supports:
 * TCP/UDP sending and receiving.
 * Timestamp, window scaling, MSS TCP options.
 * PAWS.
 * Slow start and congestion control.
 * Route table (including startic ARP cache).
 * Socket-like interface.
 * IP and ethernet processing code.
 * complete retransmit algorithm.
 * fast retransmit support.
 * support for TCP listen state (only point-to-point mode, i.e. no new data
        channels are created, when new client is connected, instead state is 
changed 
        according to protocol (TCP state is changed to ESTABLISHED).
 * support for the new netchannels interface.

Speed/CPU usage graph for the socket code (which uses epoll and send/recv) is 
attached.
With the same 100 Mbit speed, CPU usage for netchanenls and userspace
network stack is about 2-3 times smaller than socket one with small
packet (128 bytes) sending/receiving.

There is very strange behaviour of userspace time() function, which if
being used actively results in extremely high kernel load and following
functions start to appear on the top of profiles:
 * get_offset_pmtmr() - 25%, second position, even higher than 
sysenter_past_esp().
 * do_gettimeofday() - 0.6%, 4'th place.
 * delay_pmtmr() - 0.29%, 11'th place.

First place is poll_idle().

Testing system, which runs either netchannel or socket tests runs 
HT-enabled Xeon:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 2.40GHz
stepping        : 7

with 1GB of RAM and e100 network adapter on Linux 2.6.17-rc3.

Main (vanilla) system is amd64 with 1GB of RAM and 8169 gigabit adapter
on Linux 2.6.18-1.2200.fc5, software is either netcat dumping data into 
/dev/null or sendfile based server.

All sources are available on project's homepages.

Thank you.

1. Netchannels subsystem.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=netchannel

2. Userspace network stack.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=unetstack

3. Network allocator.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=nta

If you have read upto here, then I want you to know that adverticement is
over. Thanks again.

-- 
        Evgeniy Polyakov

Attachment: atcp_speed.png
Description: PNG image

Reply via email to