Netchannel [1] is pure bridge between low-level hardware and user, without any special protocol processing involved between them. Users are not limited to userspace only - I will use this netchannel infrastructure for fast NAT implementation, which is purely kernelspace user (although it is possible to create NAT in userspace, but price of the kernelspace board crossing is too high, which only needs to change some fields in the header and recalculate checksum). Userspace network stack [2] is another user of the new netchannel subsystem.
Current netchannel version supports data transfer using copy*user(). One could ask how does it differ from netfilter's queue target? There are three differencies (read advantages): * it does not depend on netfilter (and thus does not introduce it's slow path) * it is very scalable, since it does not use neither hash tables, nor lists * it does not depend on netfilter (and thus does not introduce it's slow path). Yes, again, since if we get into account NAT implementation, then we need to add dependency on connection tracking, which is not needed for existing netchannels implementation. It is also much smaller and scalable compared to tun/tap devices. And some other small advantages: possibility to perform zero-copy sending and receiving using network allocator's [3] facilities (not implemented in the current version of netchannels), it is very small, there are no locks in the very short fast path (except RCU and skb queue linking lock, which is held for 5 operations) and so on... There are also some limitations: it is only possible to get one packet per read from netchannel's file descriptor (it is possible to extend it to read several packets, but right now I leave it as is), it is ipv4 only (I'm lazy and only implemented tree comparison functions for IPv4 addresses). First user of the netchannel subsystem is userspace network stack [2], which supports: * TCP/UDP sending and receiving. * Timestamp, window scaling, MSS TCP options. * PAWS. * Slow start and congestion control. * Route table (including startic ARP cache). * Socket-like interface. * IP and ethernet processing code. * complete retransmit algorithm. * fast retransmit support. * support for TCP listen state (only point-to-point mode, i.e. no new data channels are created, when new client is connected, instead state is changed according to protocol (TCP state is changed to ESTABLISHED). * support for the new netchannels interface. Speed/CPU usage graph for the socket code (which uses epoll and send/recv) is attached. With the same 100 Mbit speed, CPU usage for netchanenls and userspace network stack is about 2-3 times smaller than socket one with small packet (128 bytes) sending/receiving. There is very strange behaviour of userspace time() function, which if being used actively results in extremely high kernel load and following functions start to appear on the top of profiles: * get_offset_pmtmr() - 25%, second position, even higher than sysenter_past_esp(). * do_gettimeofday() - 0.6%, 4'th place. * delay_pmtmr() - 0.29%, 11'th place. First place is poll_idle(). Testing system, which runs either netchannel or socket tests runs HT-enabled Xeon: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.40GHz stepping : 7 with 1GB of RAM and e100 network adapter on Linux 2.6.17-rc3. Main (vanilla) system is amd64 with 1GB of RAM and 8169 gigabit adapter on Linux 2.6.18-1.2200.fc5, software is either netcat dumping data into /dev/null or sendfile based server. All sources are available on project's homepages. Thank you. 1. Netchannels subsystem. http://tservice.net.ru/~s0mbre/old/?section=projects&item=netchannel 2. Userspace network stack. http://tservice.net.ru/~s0mbre/old/?section=projects&item=unetstack 3. Network allocator. http://tservice.net.ru/~s0mbre/old/?section=projects&item=nta If you have read upto here, then I want you to know that adverticement is over. Thanks again. -- Evgeniy Polyakov
atcp_speed.png
Description: PNG image