from:"Alexey Kuznetsov"

Re: [PATCH] net: Fragment large datagrams even when IP_HDRINCL is set.

2016-07-08 Thread Alexey Kuznetsov

Hello! I can tell why it has not been done initially. Main problem was in IP options, which can be present in raw packet. They have to be properly fragmented, some options are to be deleted on fragments. Not that it is too complicated, it is just boring and ugly and inconsistent with IP_HDRINCL l

Re: [PATCH net-2.6 0/3]: Three TCP fixes

2007-12-05 Thread Alexey Kuznetsov

Hello! > My theory is that it could relate to tcp_cwnd_restart and > tcp_cwnd_application_limited using it and the others are just then > accidently changed as well. Perhaps I'll have to dig once again to > changelog history to see if there's some clue (unless Alexey shed > some light to this)

Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode

2007-12-03 Thread Alexey Kuznetsov

On Mon, Dec 03, 2007 at 10:39:35PM +1100, Herbert Xu wrote: > So we need to fix this, and whatever the fix is will probably render > the BH/USER distinction obsolete. Hmm, I would think opposite. USER (or generic) is expensive variant, BH is lite. No? Alexey -- To unsubscribe from this list: send

Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)

2007-11-19 Thread Alexey Kuznetsov

Hello! > Is there a reason that the target hardware address isn't the target > hardware address? It is bound only to the fact that linux uses protocol address of the machine, which responds. It would be highly confusing (more than confusing :-)), if we used our protocol address and hardware addre

Re: [PATCH] net/ipv4/arp.c: Fix arp reply when sender ip 0 (was: Strange behavior in arp probe reply, bug or feature?)

2007-11-15 Thread Alexey Kuznetsov

Hello! > Send a correct arp reply instead of one with sender ip and sender > hardware adress in target fields. I do not see anything more legal in setting target address to 0. Actually, semantics of target address in ARP reply is ambiguous. If it is a reply to some real request, it is set to ad

Re: [2.6 patch] remove Documentation/networking/routing.txt

2007-11-05 Thread Alexey Kuznetsov

Hello! > This file is so outdated that I can't see any value in keeping it. Absolutely agree. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND] ip_gre: sendto/recvfrom NBMA address

2007-10-24 Thread Alexey Kuznetsov

Hello! > I was able to set a nbma gre tunnel, add routes to it and it worked > perfectly ok. > > Link-level next hop worked: > ip route add via dev onlink This can work if you use gre0. By plain luck it has all-zero dev_addr. It will break on nbma devices set with: ip tunnel add XXX mode gr

Re: [PATCH RESEND] ip_gre: sendto/recvfrom NBMA address

2007-10-23 Thread Alexey Kuznetsov

Hello! Me wrote: > Ack. This is good idea. > > Frankly, I was sure ip_gre worked in this way all these years. > I do not remember any reasons why it was crippled. > > The only dubious case is when next hop is set using routing tables. > But code in ipgre_tunnel_xmit() is ready to accept this si

Re: [PATCH RESEND] ip_gre: sendto/recvfrom NBMA address

2007-10-23 Thread Alexey Kuznetsov

Hello! > When GRE tunnel is in NBMA mode, this patch allows an application to use > a PF_PACKET socket to: > - send a packet to specific NBMA address with sendto() > - use recvfrom() to receive packet and check which NBMA address it came from > > This is required to implement properly NHRP over G

Re: [PATCH 5/10] [NET]: Avoid unnecessary cloning for ingress filtering

2007-10-15 Thread Alexey Kuznetsov

Hello! > If it is causing trouble, then one idea would be to move the resetting > to a wrapper function which calls clone first and then resets the other > fields. All actions currently cloning would need to be mod-ed to use > that call. I see not so many places inside net/sched/act* where skb_cl

Re: SFQ qdisc crashes with limit of 2 packets

2007-09-21 Thread Alexey Kuznetsov

p the whole range of hash values. Switched to Jenkins' hash. Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3a23e30..b542c87 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -19,6 +19,7 @@ #include #in

Re: SFQ qdisc crashes with limit of 2 packets

2007-09-19 Thread Alexey Kuznetsov

Hello! > OK the off-by-one prevents an out-of-bounds array access, Yes, this is not off-by-one (off-by-two, to be more exact :-)). Maximal queue length is really limited by SFQ_DEPTH-2, because: 1. SFQ keeps list of queue lengths in array of length SFQ_DEPTH. This means length of queue must

Re: Problem with implementation of TCP_DEFER_ACCEPT?

2007-08-24 Thread Alexey Kuznetsov

nt doesn't send a packet containing data before the SYN_ACK > > time-outs finally expire the connection will be dropped. > > A brought this up a long, long time ago, and I seem to remember > Alexey Kuznetsov explained me at the time that this was intentional. Obviously, I s

Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Alexey Kuznetsov

Hello! > Good point, I didn't think of that. Is there a version of this patch > that already uses different namespaces so I can look at it? Pavel does not like the idea. It looks "not exactly pretty", like you said. :-) The alternative is to create pair in main namespace and then move one end to

Re: [RFC RTNETLINK 00/09]: Netlink link creation API

2007-06-06 Thread Alexey Kuznetsov

Hello! >I just suggested to > Pavel to create only a single device per newlink operation and binding > them later, I see some logical inconsistency here. Look, the second end is supposed to be in another namespace. It will have identity, which cann

netdev@vger.kernel.org

2007-04-26 Thread Alexey Kuznetsov

Hello! > When CONFIG_IP_MULTIPLE_TABLES is enabled, the code in nl_fib_lookup() > needs to initialize the res.r field before fib_res_put(&res) - unlike > fib_lookup(), a direct call to ->tb_lookup does not set this field. Indeed, I am sorry. Alexey - To unsubscribe from this list: send the line

[PATCH] infinite recursion in netlink

2007-04-25 Thread Alexey Kuznetsov

table is missing 2. Do not crash when queue is empty (does not happen, but yet) 3. Put result of lookup Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index fc920f6..cac06c4 100644 --- a/net/ipv4/fib_frontend.c +++ b/ne

Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov

Hello! > This might work. Could you post a patch to better show what you mean to do? Here it is. ->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(), which is called when neighbor entry goes to dead state. At this point everything is still valid: neigh->dev, neigh->parms e

Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov

Hello! > infiniband sets parm->neigh_destructor, and I search for a way to prevent > this destructor from being called after the module has been unloaded. > Ideas? It must be called in any case to update/release internal ipoib structures. The idea is to move call of parm->neigh_destructor from n

Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov

Hello! > If a device driver sets neigh_destructor in neigh_params, this could > get called after the device has been unregistered and the driver module > removed. It is the same problem: if dst->neighbour holds neighbour, it should not hold device. parms->dev is not supposed to be used after neig

Re: [ofa-general] Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov

Hello! > I think the thing to do is to just leave the loopback references > in place, try to unregister the per-namespace loopback device, > and that will safely wait for all the references to go away. Yes, it is exactly how it works in openvz. All the sockets are killed, queues are cleared, nobo

Re: [ofa-general] Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov

Hello! > Does this look sane (untested)? It does not, unfortunately. Instead of regular crash in infiniband you will get numerous random NULL pointer dereferences both due to dst->neighbour and due to dst->dev. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the bo

Re: [ofa-general] Re: dst_ifdown breaks infiniband?

2007-03-19 Thread Alexey Kuznetsov

Hello! > Well I don't think the loopback device is currently but as soon > as we get network namespace support we will have multiple loopback > devices and they will get unregistered when we remove the network > namespace. There is no logical difference. At the moment when namespace is gone there

Re: dst_ifdown breaks infiniband?

2007-03-18 Thread Alexey Kuznetsov

Hello! > > It should be cleared and we should be sure it will not be destroyed > > before quiescent state. > > I'm confused. didn't you say dst_ifdown is called after quiescent state? Quiescent state should happen after dst->neighbour is invalidated. And this implies that all the users of dst->n

Re: dst_ifdown breaks infiniband?

2007-03-18 Thread Alexey Kuznetsov

Hello! > Hmm. Something I don't understand: does the code > in question not run on *each* device unregister? It does. > Why do I only see this under stress? You should have some referenced destination entries to trigger bad path. This should happen not only under stress. F.e. just try to ssh

Re: dst_ifdown breaks infiniband?

2007-03-18 Thread Alexey Kuznetsov

Hello! > This is not new code, and should have triggered long time ago, > so I am not sure how come we are triggering this only now, > but somehow this did not lead to crashes in 2.6.20 I see. I guess this was plain luck. > Why is neighbour->dev changed here? It holds reference to device and p

Re: [PATCH] Copy mac_len in skb_clone() as well

2007-03-15 Thread Alexey Kuznetsov

Hello! > What bug triggered that helped you discover this? Or is it > merely from a code audit? I asked the same question. :-) openvz added some another fields to skbuff and when it was found that they are lost while clone, he tried to figure out how all this works and looked for another exampl

Re: [PATCH] TCP: Replace __kfree_skb() with kfree_skb()

2007-01-26 Thread Alexey Kuznetsov

Hello! > do you know of any place where __kfree_skb is used to free an skb > whose ref count is greater than 1? No. Actually, since kfree_skb is not inline, __kfree_skb could be made static and remaining places still using it switched to kfree_skb. - To unsubscribe from this list: send the

Re: [BUG] problem with BPF in PF_PACKET sockets, introduced in linux-2.6.19

2007-01-25 Thread Alexey Kuznetsov

Hello! > So this whole idea to make run_filter() return signed integers > and fail on negative is entirely flawed, it simply cannot work > and retain the expected semantics which have been there forever. Actually, it can. Return value was used only as sign of error, so that the mistake was to ret

Re: RFC: consistent disable_xfrm behaviour

2006-12-04 Thread Alexey Kuznetsov

Hello! > Here's the patch again properly signed off. I think it is correct. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RFC: consistent disable_xfrm behaviour

2006-12-04 Thread Alexey Kuznetsov

Hello! > Alexey, do you remember what the original intent of this was? disable_policy was supposed to skip policy checks on input. It makes sense only on input device. disable_xfrm was supposed to skip transformations on output. It makes sense only on output device. If it does not work, it was

Re: 2.6.19-rc1: Volanomark slowdown

2006-11-08 Thread Alexey Kuznetsov

Hell]! > > reduced Volanomark benchmark throughput by 10%. The irony of it is that java vm used to be one of victims of over-delayed acks. I will look, there is a little chance that it is possible to detect the situation and to stretch ACKs. There is one little question though. If you see a v

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-22 Thread Alexey Kuznetsov

Hello! > I can't even find a reference to SIOCGSTAMP in the > dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu. > > But I will note that tpacket_rcv() expects to always get > valid timestamps in the SKB, it does a: It is equally unlikely it uses mmapped packet socket (tpacket_rcv). I even i

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-22 Thread Alexey Kuznetsov

Hello! > transactions to data segments is fubar. That issue is also why I wonder > about the setting of tcp_abc. Yes, switching ABC on/off has visible impact on amount of segments. When ABC is off, amount of segments is almost the same as number of transactions. When it is on, ~1.5% are merged.

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread Alexey Kuznetsov

Hello! > There isn't any sort of clever short-circuiting in loopback is there? No, from all that I know. > I > do like the convenience of testing things over loopback, but always fret > about not including drivers and actua

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello! > Please think about it this way: > suppose you haave a heavily loaded router and some network problem is to > be diagnosed. You run tcpdump and suddenly router becomes overloaded (by > switching to timestamp-it-all mode I am sorry. I cannot think that way. :-) Instead of attempts to scar

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello! > Ok, ok, but don't we have queueing disciplines that need the timestamp > even on ingress? I cannot find. ip_queue does. But it is just another user, not different of sockets. BTW in any case, any user of timestamp who sees 0, because skb was received before timestamping was enabled, ha

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello! > But that never happens right? Right. Well, not right. It happens. Simply because you get packet with newer timestamp after previous handler saw this packet and did some actions. I just do not see any bad consequences. > And do you have some other prefered way to solve this? Even if t

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread Alexey Kuznetsov

Hello! Of course, number of ACK increases. It is the goal. :-) > unpleasant increase in service demands on something like a "burst > enabled" (./configure --enable-burst) netperf TCP_RR test: > > netperf -t TCP_RR -H foo -- -b N # N > 1 foo=localhost b patched orig 2 10

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello! > Hmm, not sure how that could happen. Also is it a real problem > even if it could? As I said, the problem is _occasionally_ theoretical. This would happen f.e. if packet socket handler was installed after IP handler. Then tcpdump would get packet after it is processed (acked/replied/for

Re: Network performance degradation from 2.6.11.12 to 2.6.16.20

2006-09-18 Thread Alexey Kuznetsov

Hello! > For netdev: I'm more and more thinking we should just avoid the problem > completely and switch to "true end2end" timestamps. This means don't > time stamp when a packet is received, but only when it is delivered > to a socket. This will work. >From viewpoint of existing uses of timesta

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-18 Thread Alexey Kuznetsov

Hello! > It looks perfectly fine to me, would you like me to apply it > Alexey? Yes, I think it is safe. Theoretically, there is one place where it can be not so good. Good nagling tcp connection, which makes lots of small write()s, will send MSS sized frames due to delayed ACKs. But if we ACK

Re: 2.6.18-rc6 memory mapped pcap truncates outgoing TCP packets, but not icmp

2006-09-14 Thread Alexey Kuznetsov

Hello! > [PACKET]: Don't truncate non-linear skbs with mmaped IO > > Non-linear skbs are truncated to their linear part with mmaped IO. > Fix by using skb_copy_bits instead of memcpy. Ack. I remember this trick. The "idea" was that I needed only TCP header in any case and it was perfect cutoff.

Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group

2006-09-14 Thread Alexey Kuznetsov

Hello! > No, it returns 1 (allow) if there are no filters to explicitly > filter it. I wrote that code. :-) I see. It did not behave this way old times. >From your mails I understood that current behaviour matches another implementations (BSD whatever), is it true? Alexey - To unsubscri

Re: [PATCH] make ipv4 multicast packets only get delivered to sockets that are joined to group

2006-09-13 Thread Alexey Kuznetsov

Hello! > IPv6 behaves the same way. Actually, Linux IPv6 filters received multicasts, inet6_mc_check() does this. IPv4 does not. I remember that attempts to do this were made in the past and failed, because some applications, related to multicast routing, did expect to receive all the multicasts

Re: [PATCH][RFC] Re: high latency with TCP connections

2006-09-05 Thread Alexey Kuznetsov

Hello! > Is this really necessary? No, of course. We lived for ages without this, would live for another age. > I thought that the problems with ABC were in > trying to apply byte-based heuristics from the RFC(s) to a > packet-oritented cwnd in the stack? It was just t

Re: ProxyARP and IPSec

2006-09-05 Thread Alexey Kuznetsov

Hello! > >1. Probably, will not accept fragmented frames, because IPsec cannot > > handle them ... > I'm clearly failing to understand where, exactly, the problems lie. I > would appreciate any pointers and/or clue transfusion... I said "probably". Look into old rfc2401, search for word "fra

Re: ProxyARP and IPSec

2006-09-04 Thread Alexey Kuznetsov

Hello! > > > What I great idea. Now I just have to get every host I want to > interoperate with to support a nonstandard configuration. The scary > part is that if I motivate it with "Linux is too stupid to handle > standard tunnel-mode IPsec" I might actually get away with it. sarcasm mod

[PATCH][RFC] Re: high latency with TCP connections

2006-09-04 Thread Alexey Kuznetsov

s case ACK is forced after tcp_recvmsg() drains receive buffer. In other words, it is a "soft" each-2d-segment ACK, which is enough to preserve ACK clock even when ABC is enabled. Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]> diff --git a/include/net/inet_connection

Re: high latency with TCP connections

2006-09-04 Thread Alexey Kuznetsov

Hello! > At least for slow start it is safe, but experiments with atcp for > netchannels showed that it is better not to send excessive number of > acks when slow start is over, If this thing is done from tcp_cleanup_rbuf(), it should not affect performance too much. Note, that with ABC and anot

Re: 2.6.18-rc5 with GRE, iptables and Speedtouch ADSL, PPP over ATM

2006-09-04 Thread Alexey Kuznetsov

Hello! > This path obviously breaks assumption 1) and therefore can lead to ABBA > dead-locks. Yes... > I've looked at the history and there seems to be no reason for the lock > to be held at all in dev_watchdog_up. The lock appeared in day one and > even there it was unnecessary. Seems, it s

Re: high latency with TCP connections

2006-09-01 Thread Alexey Kuznetsov

Hello! > problem. The problem is really at the receiver because we only > ACK every other full sized frame. I had the idea to ACK every 2 > frames, regardless of size, This would solve lots of problems. >but that might have other problems. BSD used to do this, everyon

Re: NAPI: netif_rx_reschedule() ??

2006-08-31 Thread Alexey Kuznetsov

Hello! > However I'm confused about a couple of things, and there are only two > uses of netif_rx_reschedule() in the kernel, so I'm a little stuck. First, do not believe to even single bit of code or docs about netif_rx_reschedule(). It was used once in the first version of NAPI for 3com driver

Re: high latency with TCP connections

2006-08-31 Thread Alexey Kuznetsov

Hello! > 2) a way to take delayed ACKs into account for cwnd growth This part is OK now, right? > 1) protection against ACK division But Linux never had this problem... Congestion window was increased only when a whole skb is ACKed, flag FLAG_DATA_ACKED. (TSO could break this, but should not).

Re: high latency with TCP connections

2006-08-31 Thread Alexey Kuznetsov

Hello! > Expecting any performance with one byte write's is silly. I am not sure why you are so confident about status of ABC. I missed the discussions, when it was implemented. Apparently, it was noticed that ABC in its pure form does not make sense with snd_cwnd counted in packets and there wer

Re: [PATCH] fix sk->sk_filter field access

2006-08-30 Thread Alexey Kuznetsov

Hello! > Really? > > It is used with needlock=0 by DCCP ipv6, for example. This case seems > correct too. What about sk_receive_skb()? dn_queue_skb()? In fact, > there seems to be numerous uses still with needlock=0, all legitimate. Well, not quite legitime. sk_receive_skb() has the same bug

Re: [PATCH] fix sk->sk_filter field access

2006-08-30 Thread Alexey Kuznetsov

Hello! > > Function sk_filter() is called from tcp_v{4,6}_rcv() functions with argue > > needlock = 0, while socket is not locked at that moment. In order to avoid > > this and similar issues in the future, use rcu for sk->sk_filter field read > > protection. > > > > Patch is for net-2.6.19 >

Re: [PATCH 4/6] net neighbour: convert to RCU

2006-08-29 Thread Alexey Kuznetsov

Hello! > Race 1: w/o RCU > Cpu 0: is in neigh_lookup > gets read_lock() > finds entry > ++refcount to 2 >

Re: [PATCH 4/6] net neighbour: convert to RCU

2006-08-29 Thread Alexey Kuznetsov

Hello! Yes, I forgot to say I take back my suggestion about atomic_inc_test_zero(). It would not work. Seems, it is possible to add some barriers around setting n->dead and testing it in neigh_lookup_rcu(), but it would be scary and ugly. To be honest, I just do not know how to do this. :-) - To

Re: [PATCH 4/6] net neighbour: convert to RCU

2006-08-29 Thread Alexey Kuznetsov

Hello! > This should not be any more racy than the existing code. Existing code is not racy. Critical place is interpretation of refcnt==1. Current code assumes, that when refcnt=1 and entry is in hash table, nobody can take this entry (table is locked). So, it can be unlinked from the table. S

Re: [PATCH 4/6] net neighbour: convert to RCU

2006-08-29 Thread Alexey Kuznetsov

Hello! > > Also, probably, it makes sense to add neigh_lookup_light(), which does > > not take refcnt, but required to call > > neigh_release_light() (which is just rcu_read_unlock_bh()). > > Which code paths would that make sense on? > fib_detect_death (ok) > infiniband (ok) >

Re: [PATCH 4/6] net neighbour: convert to RCU

2006-08-29 Thread Alexey Kuznetsov

Hello! > atomic_inc_and_test is true iff result is zero, so that won't work. I meant atomic_inc_not_zero(), as Martin noticed. > But the following should work: > > hlist_for_each_entry_rcu(n, tmp, &tbl->hash_buckets[hash_val], hlist) { > if (dev == n->dev && !memcmp(n->prim

Re: [RFC IPv6] Disabling IPv6 autoconf

2006-08-29 Thread Alexey Kuznetsov

Hello! > Yes, it is logical because without multicast IPV6 cannot > work correctly. This is not quite true. IFF_BROADCAST is enough, it will work just like IPv4. Real troubles start only when interface is not IFF_BROADCAST and not IFF_POINTOPOINT. > IFF_MULTICAST flag seems potentially problem

Re: [PATCH 4/6] net neighbour: convert to RCU

2006-08-29 Thread Alexey Kuznetsov

Hello! > @@ -346,8 +354,8 @@ struct neighbour *neigh_lookup(struct ne > > NEIGH_CACHE_STAT_INC(tbl, lookups); > > - read_lock_bh(&tbl->lock); > - hlist_for_each_entry(n, tmp, &tbl->hash_buckets[hash_val], hlist) { > + rcu_read_lock(); > + hlist_for_each_entry_rcu(n,

Re: ProxyARP and IPSec

2006-08-24 Thread Alexey Kuznetsov

Hello! > I'm thinking that David definitely has a point about having a usability > problem, though. All other kind of tunnels have endpoint devices > associated with them, and that would make all these kinds of problems go > away, Yes, when you deal with sane practical setups, this approach

Re: ProxyARP and IPSec

2006-08-23 Thread Alexey Kuznetsov

Hello! > What he's trying to accomplish doesn't sound all that weird, Absolutely sane. > does anyone have any other ideas? The question is where is this host really? If it is far far away and connected only via IPsec tunnel with destionation of tunnel different of host address ip ro add THEH

Re: Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Alexey Kuznetsov

Hello! > Isn't a socket freed until all skb are handled? In which case the limit on > the number of open > files limits the total memory usage? (Same as with streaming sockets?) Alas. Number of closed sockets is not limited. Actually, it is limited by sk_max_ack_backlog*max_files, which is a lot

Re: [take12 0/3] kevent: Generic event handling mechanism.

2006-08-22 Thread Alexey Kuznetsov

Hello! > >No way - timespec uses long. > > I must have missed that discussion. Please enlighten me in what regard > using an opaque type with lower resolution is preferable to a type > defined in POSIX for this sort of purpose. Let me explain, as a person who did this mistake and deeply regrets

Re: Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Alexey Kuznetsov

Hello! > > It is the only protection of commiting infinite amount of memory to a > > socket. > > Doesn't the "if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)" check in > sock_alloc_send_pskb() > limit things already? Unfortunately, it does not. You can open a socket, send something to a s

Re: Get rid of /proc/sys/net/unix/max_dgram_qlen

2006-08-22 Thread Alexey Kuznetsov

Hello! > Either this, or it should be implemented correctly, which means poll needs > to be fixed to also check for max_dgram_qlen, Feel free to do this correctly. :-) Deleting "wrong" code rarely helps. It is the only protection of commiting infinite amount of memory to a socket. Alexey - To u

[PATCH] locking bug in fib_semantics.c

2006-08-17 Thread Alexey Kuznetsov

ock(&fib_info_lock), and spin forever. Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]> --- diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 4ea6c68..5dfdad5 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -159,7 +159,7 @@ void fre

Re: [PATCH?] tcp and delayed acks

2006-08-16 Thread Alexey Kuznetsov

Hello! > send out any delayed ACKs when it is clear that the receiving process is > waiting for more data? It has just be done in tcp_cleanup_rbuf() a few lines before your chunk. There is some somplex condition to be satisfied there and it is impossible to relax it any further. I do not know w

Re: [RFC] network namespaces

2006-08-16 Thread Alexey Kuznetsov

Hello! > (application) containers. Performance aside, are there any reasons why > this approach would be problematic for c/r? This approach is just perfect for c/r. Probably, this is the only approach when migration can be done in a clean and self-consistent way. Alexey - To unsubscribe from t

Re: [PATCH 09/16] [IPv6] address: Convert address notification to use rtnl_notify()

2006-08-16 Thread Alexey Kuznetsov

Hello! > In one conversation with Alexey he told me there was some inspiration > from pfkey in the semantics of it i.e processid. Inspiration, but not a copy. :-) Unlike pfkeyv2 it uses addressing usual for networking i.e. struct sockaddr_nl. Alexey - To unsubscribe from this list: send the lin

Re: [PATCH 09/16] [IPv6] address: Convert address notification to use rtnl_notify()

2006-08-16 Thread Alexey Kuznetsov

Hello! > The netlink header pid is really akin to sadb_msg_pid from RFC 2367. > IMHO it should always be zero if the kernel is the originator of the > message. No. Analogue of sadb_msg_pid is nladdr.nl_pid. Netlink header pid is not originator of the message, but author of the change. The notio

Re: skb_shared_info()

2006-08-15 Thread Alexey Kuznetsov

Hello! > I still like existing way - it is much simpler (I hope :) to convince > e1000 developers to fix driver's memory usage e1000 is not a problem at all. It just has to use pages. If it is going to use high order allocations, it will suck, be it order 3 or 2. > area (does MAX_TCP_HEADER eno

Re: [RFC 5/7] neighbour: convert lookup to sequence lock

2006-08-14 Thread Alexey Kuznetsov

Hello! > That wouldn't work if hard_header() ever expands the head. Fortunately > hard_header() returns the length added even in case of an error so we > can undo the absolute value returned. Yes. Or probably it is safer to undo to skb->nh. Even if hard_header expands skb, skb->nh still remains

Re: [PATCH 09/16] [IPv6] address: Convert address notification to use rtnl_notify()

2006-08-14 Thread Alexey Kuznetsov

Hello! > Some of these removals of current->pid will affect users such as quagga, > zebra, vrrpd etc. If they survived cleanup in IPv4, they definitely will not feel cleanup in IPv6. Thomas does great work, Jamal, do not worry. :-) > IMO, I believe there is a strong case that can be made for e

Re: skb_shared_info()

2006-08-14 Thread Alexey Kuznetsov

Hello! > e1000 will setup head/data/tail pointers to point to the area in the > first sg page. Maybe. But I still hope this is not necessary, the driver should be able to do at least primitive header splitting, in that case the header could be inlined to skb. Alternatively, header can be copied

Re: [PATCH 1/4] [NETLINK]: Handle NLM_F_ECHO in netlink_rcv_skb()

2006-08-13 Thread Alexey Kuznetsov

Hello! > So we do something like this: Yes, exactly. Actually, there was a function with similar functionality: rtnetlink_send(). net/sched/* used it, older net/ipv4/ still did this directly. Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to

Re: skb_shared_info()

2006-08-13 Thread Alexey Kuznetsov

Hello! > E1000 wants 16K buffers for jumbo MTU settings. > > The reason is that the chip can only handle power-of-2 buffer > sizes, and next hop from 9K is 16K. Let it use pages. Someone should start. :-) High order allocations are disaster in any case. > If we store raw kmalloc buffers, we c

Re: [PATCH 1/4] [NETLINK]: Handle NLM_F_ECHO in netlink_rcv_skb()

2006-08-12 Thread Alexey Kuznetsov

Hello! > Actually I think the only safe solution is to allocate a separate > socket for multicast messages. In other words, if you want reliable > unicast reception on a socket, don't bind it to a multicast group. Yes, it was the point of my advocacy of NLM_F_ECHO. :-) Alexey - To unsubscribe f

Re: [PATCH 1/4] [NETLINK]: Handle NLM_F_ECHO in netlink_rcv_skb()

2006-08-12 Thread Alexey Kuznetsov

Hello! > Makes sense, especially for auto generated handles. I've been listening > to the notifications on a separate socket for this purpose. That's... complicated. But cool. :-) > It does make sense, the way it has been implemented if at all is > creepy. Even worse, IPv6 is using current->pid

Re: [PATCH 1/4] [NETLINK]: Handle NLM_F_ECHO in netlink_rcv_skb()

2006-08-11 Thread Alexey Kuznetsov

Hello! > I get your point and I see the value. Unfortunately, probably due to > lack of documentation, this feature isn't used by any applications I > know of. Well, tc was supposed to use it, but this did not happen and it remained deficient. > We even put in the hacks to make identification o

Re: sender throttling for unreliable protocols not garuanteed? (different units in sock->wmem_alloc and net_devive->tx_queue_len)

2006-08-11 Thread Alexey Kuznetsov

Hello! > I'd be interested in any opinions on the above mentioned effect. Everything is right, it is exactly how it works. Well, use another qdisc, which counts in bytes rather than in frames (f.e. bfifo) Set sndbuf small enough. And if sndbuf*#senders is still too large, you have to use fair

Re: skb_shared_info()

2006-08-11 Thread Alexey Kuznetsov

Hello! >> management schemes and to just wrap SKB's around >> arbitrary pieces of data. + > and something clever like a special page_offset encoding > means "use data, not page". But for what purpose do you plan to use it? > The e1000 issue is just one example of this, another What is this iss

Re: the mystery that is sock_fasync

2006-08-11 Thread Alexey Kuznetsov

Hello! > Did I miss some way that multiple file objects can point to the > same socket inode? Absolutely prohibited. Always was. Apparently, sock_fasync() was cloned from tty_fasync(), that's the only reason why it is so creepy. Alexey - To unsubscribe from this list: send the line "unsubscribe

Re: [PATCH 1/4] [NETLINK]: Handle NLM_F_ECHO in netlink_rcv_skb()

2006-08-10 Thread Alexey Kuznetsov

Hello! > What's wrong with listening to the notification for that purpose? Nothing! NLM_F_ECHO _is_ listening for notifications without subscription to multicast groups and need to figure out what messages are yours. But beyond this NLM_F_ECHO is totally subset of this. Which still makes much mor

Re: [PATCH 1/4] [NETLINK]: Handle NLM_F_ECHO in netlink_rcv_skb()

2006-08-10 Thread Alexey Kuznetsov

Hello! > This patch handles NLM_F_ECHO in netlink_rcv_skb() to > handle it in a central point. Most subsystems currently > interpret NLM_F_ECHO as to just unicast events to the > originator of the change while the real meaning of the > flag is to echo the request. Do not you think it is useless t

Re: [PATCH] llc: SOCK_DGRAM interface fixes

2006-08-08 Thread Alexey Kuznetsov

Hello! > This fix goes against the old historical comments about UNIX98 semantics > but without this fix SOCK_DGRAM is broken and useless. So either ANK's > interpretation was incorect or UNIX98 standard was wrong. Just found this reference to me. :-) The comment migrated from tcp.c. It is only

Re: [PATCH] limit rt cache size

2006-08-07 Thread Alexey Kuznetsov

Hello! > During OpenVZ stress testing we found that UDP traffic with > random src can generate too much excessive rt hash growing > leading finally to OOM and kernel panics. > > It was found that for 4GB i686 system (having 1048576 total pages and > 225280 normal zone pages) kernel allocates the

Re: [PATCH] NET: fix kernel panic from no dev->hard_header_len space

2006-08-01 Thread Alexey Kuznetsov

Hello! > Do the semantics (I'm not talking about bugs) allow skb passed > to dev->hard_header() (if defined) No. dev->hard_header() should get enough of space, which is dev->hard_header_len. Actually, it is historical hole in design, inherited from ancient times. Calling conventions of dev->hard

Re: [PATCH] NET: fix kernel panic from no dev->hard_header_len space

2006-08-01 Thread Alexey Kuznetsov

Hello! > > Alexey, any suggestions on how to handle this kind of thing? Device, which adds something at head must check for space. Anyone, who adds something at head, must check. Otherwise, it will remain buggy forever. > What's wrong with my patch? As I already said there is nothing wrong wit

Re: [PATCH] NET: fix kernel panic from no dev->hard_header_len space

2006-07-31 Thread Alexey Kuznetsov

Hello! > It does seem weird that IP output won't pay attention to Not so weird, actually. The logic was: Only initial skb allocation tries to reserve all the space to avoid copies in the future. All the rest of places just check, that there is enough space for their immediate needs. If dev->ha

Re: Netchannles: first stage has been completed. Further ideas.

2006-07-27 Thread Alexey Kuznetsov

Hello! > kernel thread takes 100% cpu (with preemption Preemption, you tell... :-) I begged you to spend 1 minute of your time to press ^Z. Did you? Alexey - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at h

Re: [PATCH] NET: fix kernel panic from no dev->hard_header_len space

2006-07-27 Thread Alexey Kuznetsov

Hello! > ip_output() ignores dev->hard_header_len ip_output() worries about the space, which it needs. If some place needs more, it is its problem to check. To the moment where it is used, hard_header_len can even change. It can be applied, but it does not change the fact, that those placed whi

Re: Netchannles: first stage has been completed. Further ideas.

2006-07-27 Thread Alexey Kuznetsov

Hello! On Thu, Jul 27, 2006 at 03:46:12PM +1000, Rusty Russell wrote: > Of course, it means rewriting all the userspace tools, documentation, > and creating a complete new infrastructure for connection tracking and > NAT, but if that's what's required, then so be it. That's what I love to hear. N

Re: [PATCH] ip multicast route bug fix

2006-07-26 Thread Alexey Kuznetsov

HellO! > I like this. However, since the cloned skb is either discarded in case > of error, or queued in which case the caller discards its reference right > away, wouldn't it be simpler to just do this? Well, if we wanted just to cheat those checking tools, it is nice. But if we want clarity, i

Re: [PATCH] ip multicast route bug fix

2006-07-25 Thread Alexey Kuznetsov

Hello! > Wouldn't it be better to have a consistent interface (skb always freed), > and clone the skb if needed for deferred processing? I think you mean this. Note, it is real skb_clone(), not alloc_skb(). Equeued skb contains the whole half-prepared netlink message plus room for the rest. It c

Re: [PATCH] ip multicast route bug fix

2006-07-25 Thread Alexey Kuznetsov

Hello! > Wouldn't it be better to have a consistent interface (skb always freed), > and clone the skb if needed for deferred processing? I am sorry, I misunderstood you. I absolutely agree. It is much better, the variant which I suggested is a good sample of bad programming. :-) Alexey - To uns

1 2 >

1 - 100 of 148 matches

Mail list logo