Hello!
I can tell why it has not been done initially.
Main problem was in IP options, which can be present in raw packet.
They have to be properly fragmented, some options are to be deleted
on fragments. Not that it is too complicated, it is just boring and ugly
and inconsistent with IP_HDRINCL l
Hello!
> My theory is that it could relate to tcp_cwnd_restart and
> tcp_cwnd_application_limited using it and the others are just then
> accidently changed as well. Perhaps I'll have to dig once again to
> changelog history to see if there's some clue (unless Alexey shed
> some light to this)
On Mon, Dec 03, 2007 at 10:39:35PM +1100, Herbert Xu wrote:
> So we need to fix this, and whatever the fix is will probably render
> the BH/USER distinction obsolete.
Hmm, I would think opposite. USER (or generic) is expensive variant,
BH is lite. No?
Alexey
--
To unsubscribe from this list: send
Hello!
> Is there a reason that the target hardware address isn't the target
> hardware address?
It is bound only to the fact that linux uses protocol address
of the machine, which responds. It would be highly confusing
(more than confusing :-)), if we used our protocol address and hardware
addre
Hello!
> Send a correct arp reply instead of one with sender ip and sender
> hardware adress in target fields.
I do not see anything more legal in setting target address to 0.
Actually, semantics of target address in ARP reply is ambiguous.
If it is a reply to some real request, it is set to ad
Hello!
> This file is so outdated that I can't see any value in keeping it.
Absolutely agree.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello!
> I was able to set a nbma gre tunnel, add routes to it and it worked
> perfectly ok.
>
> Link-level next hop worked:
> ip route add via dev onlink
This can work if you use gre0. By plain luck it has all-zero dev_addr.
It will break on nbma devices set with:
ip tunnel add XXX mode gr
Hello!
Me wrote:
> Ack. This is good idea.
>
> Frankly, I was sure ip_gre worked in this way all these years.
> I do not remember any reasons why it was crippled.
>
> The only dubious case is when next hop is set using routing tables.
> But code in ipgre_tunnel_xmit() is ready to accept this si
Hello!
> When GRE tunnel is in NBMA mode, this patch allows an application to use
> a PF_PACKET socket to:
> - send a packet to specific NBMA address with sendto()
> - use recvfrom() to receive packet and check which NBMA address it came from
>
> This is required to implement properly NHRP over G
Hello!
> If it is causing trouble, then one idea would be to move the resetting
> to a wrapper function which calls clone first and then resets the other
> fields. All actions currently cloning would need to be mod-ed to use
> that call.
I see not so many places inside net/sched/act* where skb_cl
p the whole range of hash values.
Switched to Jenkins' hash.
Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]>
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 3a23e30..b542c87 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -19,6 +19,7 @@
#include
#in
Hello!
> OK the off-by-one prevents an out-of-bounds array access,
Yes, this is not off-by-one (off-by-two, to be more exact :-)).
Maximal queue length is really limited by SFQ_DEPTH-2, because:
1. SFQ keeps list of queue lengths in array of length SFQ_DEPTH.
This means length of queue must
nt doesn't send a packet containing data before the SYN_ACK
> > time-outs finally expire the connection will be dropped.
>
> A brought this up a long, long time ago, and I seem to remember
> Alexey Kuznetsov explained me at the time that this was intentional.
Obviously, I s
Hello!
> Good point, I didn't think of that. Is there a version of this patch
> that already uses different namespaces so I can look at it?
Pavel does not like the idea. It looks "not exactly pretty", like you said. :-)
The alternative is to create pair in main namespace and then move
one end to
Hello!
>I just suggested to
> Pavel to create only a single device per newlink operation and binding
> them later,
I see some logical inconsistency here.
Look, the second end is supposed to be in another namespace.
It will have identity, which cann
Hello!
> When CONFIG_IP_MULTIPLE_TABLES is enabled, the code in nl_fib_lookup()
> needs to initialize the res.r field before fib_res_put(&res) - unlike
> fib_lookup(), a direct call to ->tb_lookup does not set this field.
Indeed, I am sorry.
Alexey
-
To unsubscribe from this list: send the line
table is missing
2. Do not crash when queue is empty (does not happen, but yet)
3. Put result of lookup
Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]>
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index fc920f6..cac06c4 100644
--- a/net/ipv4/fib_frontend.c
+++ b/ne
Hello!
> This might work. Could you post a patch to better show what you mean to do?
Here it is.
->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(),
which is called when neighbor entry goes to dead state. At this point
everything is still valid: neigh->dev, neigh->parms e
Hello!
> infiniband sets parm->neigh_destructor, and I search for a way to prevent
> this destructor from being called after the module has been unloaded.
> Ideas?
It must be called in any case to update/release internal ipoib structures.
The idea is to move call of parm->neigh_destructor from n
Hello!
> If a device driver sets neigh_destructor in neigh_params, this could
> get called after the device has been unregistered and the driver module
> removed.
It is the same problem: if dst->neighbour holds neighbour, it should
not hold device. parms->dev is not supposed to be used after
neig
Hello!
> I think the thing to do is to just leave the loopback references
> in place, try to unregister the per-namespace loopback device,
> and that will safely wait for all the references to go away.
Yes, it is exactly how it works in openvz. All the sockets are killed,
queues are cleared, nobo
Hello!
> Does this look sane (untested)?
It does not, unfortunately.
Instead of regular crash in infiniband you will get numerous
random NULL pointer dereferences both due to dst->neighbour
and due to dst->dev.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the bo
Hello!
> Well I don't think the loopback device is currently but as soon
> as we get network namespace support we will have multiple loopback
> devices and they will get unregistered when we remove the network
> namespace.
There is no logical difference. At the moment when namespace is gone
there
Hello!
> > It should be cleared and we should be sure it will not be destroyed
> > before quiescent state.
>
> I'm confused. didn't you say dst_ifdown is called after quiescent state?
Quiescent state should happen after dst->neighbour is invalidated.
And this implies that all the users of dst->n
Hello!
> Hmm. Something I don't understand: does the code
> in question not run on *each* device unregister?
It does.
> Why do I only see this under stress?
You should have some referenced destination entries to trigger bad path.
This should happen not only under stress.
F.e. just try to ssh
Hello!
> This is not new code, and should have triggered long time ago,
> so I am not sure how come we are triggering this only now,
> but somehow this did not lead to crashes in 2.6.20
I see. I guess this was plain luck.
> Why is neighbour->dev changed here?
It holds reference to device and p
Hello!
> What bug triggered that helped you discover this? Or is it
> merely from a code audit?
I asked the same question. :-)
openvz added some another fields to skbuff and when it was found
that they are lost while clone, he tried to figure out how all this works
and looked for another exampl
Hello!
> do you know of any place where __kfree_skb is used to free an skb
> whose ref count is greater than 1?
No.
Actually, since kfree_skb is not inline, __kfree_skb could be made static
and remaining places still using it switched to kfree_skb.
-
To unsubscribe from this list: send the
Hello!
> So this whole idea to make run_filter() return signed integers
> and fail on negative is entirely flawed, it simply cannot work
> and retain the expected semantics which have been there forever.
Actually, it can. Return value was used only as sign of error,
so that the mistake was to ret
Hello!
> Here's the patch again properly signed off.
I think it is correct.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello!
> Alexey, do you remember what the original intent of this was?
disable_policy was supposed to skip policy checks on input.
It makes sense only on input device.
disable_xfrm was supposed to skip transformations on output.
It makes sense only on output device.
If it does not work, it was
Hell]!
> > reduced Volanomark benchmark throughput by 10%.
The irony of it is that java vm used to be one of victims
of over-delayed acks.
I will look, there is a little chance that it is possible
to detect the situation and to stretch ACKs.
There is one little question though. If you see a v
Hello!
> I can't even find a reference to SIOCGSTAMP in the
> dhcp-2.0pl5 or dhcp3-3.0.3 sources shipped in Ubuntu.
>
> But I will note that tpacket_rcv() expects to always get
> valid timestamps in the SKB, it does a:
It is equally unlikely it uses mmapped packet socket (tpacket_rcv).
I even i
Hello!
> transactions to data segments is fubar. That issue is also why I wonder
> about the setting of tcp_abc.
Yes, switching ABC on/off has visible impact on amount of segments.
When ABC is off, amount of segments is almost the same as number of
transactions. When it is on, ~1.5% are merged.
Hello!
> There isn't any sort of clever short-circuiting in loopback is there?
No, from all that I know.
> I
> do like the convenience of testing things over loopback, but always fret
> about not including drivers and actua
Hello!
> Please think about it this way:
> suppose you haave a heavily loaded router and some network problem is to
> be diagnosed. You run tcpdump and suddenly router becomes overloaded (by
> switching to timestamp-it-all mode
I am sorry. I cannot think that way. :-)
Instead of attempts to scar
Hello!
> Ok, ok, but don't we have queueing disciplines that need the timestamp
> even on ingress?
I cannot find.
ip_queue does. But it is just another user, not different of sockets.
BTW in any case, any user of timestamp who sees 0, because skb was received
before timestamping was enabled, ha
Hello!
> But that never happens right?
Right.
Well, not right. It happens. Simply because you get packet
with newer timestamp after previous handler saw this packet
and did some actions. I just do not see any bad consequences.
> And do you have some other prefered way to solve this? Even if t
Hello!
Of course, number of ACK increases. It is the goal. :-)
> unpleasant increase in service demands on something like a "burst
> enabled" (./configure --enable-burst) netperf TCP_RR test:
>
> netperf -t TCP_RR -H foo -- -b N # N > 1
foo=localhost
b patched orig
2 10
Hello!
> Hmm, not sure how that could happen. Also is it a real problem
> even if it could?
As I said, the problem is _occasionally_ theoretical.
This would happen f.e. if packet socket handler was installed
after IP handler. Then tcpdump would get packet after it is processed
(acked/replied/for
Hello!
> For netdev: I'm more and more thinking we should just avoid the problem
> completely and switch to "true end2end" timestamps. This means don't
> time stamp when a packet is received, but only when it is delivered
> to a socket.
This will work.
>From viewpoint of existing uses of timesta
Hello!
> It looks perfectly fine to me, would you like me to apply it
> Alexey?
Yes, I think it is safe.
Theoretically, there is one place where it can be not so good.
Good nagling tcp connection, which makes lots of small write()s,
will send MSS sized frames due to delayed ACKs. But if we ACK
Hello!
> [PACKET]: Don't truncate non-linear skbs with mmaped IO
>
> Non-linear skbs are truncated to their linear part with mmaped IO.
> Fix by using skb_copy_bits instead of memcpy.
Ack.
I remember this trick. The "idea" was that I needed only TCP header in any
case and it was perfect cutoff.
Hello!
> No, it returns 1 (allow) if there are no filters to explicitly
> filter it. I wrote that code. :-)
I see. It did not behave this way old times.
>From your mails I understood that current behaviour matches another
implementations (BSD whatever), is it true?
Alexey
-
To unsubscri
Hello!
> IPv6 behaves the same way.
Actually, Linux IPv6 filters received multicasts, inet6_mc_check() does
this.
IPv4 does not. I remember that attempts to do this were made in the past
and failed, because some applications, related to multicast routing,
did expect to receive all the multicasts
Hello!
> Is this really necessary?
No, of course. We lived for ages without this, would live for another age.
> I thought that the problems with ABC were in
> trying to apply byte-based heuristics from the RFC(s) to a
> packet-oritented cwnd in the stack?
It was just t
Hello!
> >1. Probably, will not accept fragmented frames, because IPsec cannot
> > handle them
...
> I'm clearly failing to understand where, exactly, the problems lie. I
> would appreciate any pointers and/or clue transfusion...
I said "probably".
Look into old rfc2401, search for word "fra
Hello!
>
>
> What I great idea. Now I just have to get every host I want to
> interoperate with to support a nonstandard configuration. The scary
> part is that if I motivate it with "Linux is too stupid to handle
> standard tunnel-mode IPsec" I might actually get away with it.
sarcasm mod
s case ACK is forced after tcp_recvmsg()
drains receive buffer.
In other words, it is a "soft" each-2d-segment ACK, which is enough
to preserve ACK clock even when ABC is enabled.
Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]>
diff --git a/include/net/inet_connection
Hello!
> At least for slow start it is safe, but experiments with atcp for
> netchannels showed that it is better not to send excessive number of
> acks when slow start is over,
If this thing is done from tcp_cleanup_rbuf(), it should not affect
performance too much.
Note, that with ABC and anot
Hello!
> This path obviously breaks assumption 1) and therefore can lead to ABBA
> dead-locks.
Yes...
> I've looked at the history and there seems to be no reason for the lock
> to be held at all in dev_watchdog_up. The lock appeared in day one and
> even there it was unnecessary.
Seems, it s
Hello!
> problem. The problem is really at the receiver because we only
> ACK every other full sized frame. I had the idea to ACK every 2
> frames, regardless of size,
This would solve lots of problems.
>but that might have other problems.
BSD used to do this, everyon
Hello!
> However I'm confused about a couple of things, and there are only two
> uses of netif_rx_reschedule() in the kernel, so I'm a little stuck.
First, do not believe to even single bit of code or docs about
netif_rx_reschedule(). It was used once in the first version of NAPI
for 3com driver
Hello!
> 2) a way to take delayed ACKs into account for cwnd growth
This part is OK now, right?
> 1) protection against ACK division
But Linux never had this problem... Congestion window was increased
only when a whole skb is ACKed, flag FLAG_DATA_ACKED. (TSO could
break this, but should not).
Hello!
> Expecting any performance with one byte write's is silly.
I am not sure why you are so confident about status of ABC.
I missed the discussions, when it was implemented. Apparently,
it was noticed that ABC in its pure form does not make sense
with snd_cwnd counted in packets and there wer
Hello!
> Really?
>
> It is used with needlock=0 by DCCP ipv6, for example. This case seems
> correct too. What about sk_receive_skb()? dn_queue_skb()? In fact,
> there seems to be numerous uses still with needlock=0, all legitimate.
Well, not quite legitime.
sk_receive_skb() has the same bug
Hello!
> > Function sk_filter() is called from tcp_v{4,6}_rcv() functions with argue
> > needlock = 0, while socket is not locked at that moment. In order to avoid
> > this and similar issues in the future, use rcu for sk->sk_filter field read
> > protection.
> >
> > Patch is for net-2.6.19
>
Hello!
> Race 1: w/o RCU
> Cpu 0: is in neigh_lookup
> gets read_lock()
> finds entry
> ++refcount to 2
>
Hello!
Yes, I forgot to say I take back my suggestion about atomic_inc_test_zero().
It would not work.
Seems, it is possible to add some barriers around setting n->dead
and testing it in neigh_lookup_rcu(), but it would be scary and ugly.
To be honest, I just do not know how to do this. :-)
-
To
Hello!
> This should not be any more racy than the existing code.
Existing code is not racy.
Critical place is interpretation of refcnt==1. Current code assumes,
that when refcnt=1 and entry is in hash table, nobody can take this
entry (table is locked). So, it can be unlinked from the table.
S
Hello!
> > Also, probably, it makes sense to add neigh_lookup_light(), which does
> > not take refcnt, but required to call
> > neigh_release_light() (which is just rcu_read_unlock_bh()).
>
> Which code paths would that make sense on?
> fib_detect_death (ok)
> infiniband (ok)
>
Hello!
> atomic_inc_and_test is true iff result is zero, so that won't work.
I meant atomic_inc_not_zero(), as Martin noticed.
> But the following should work:
>
> hlist_for_each_entry_rcu(n, tmp, &tbl->hash_buckets[hash_val], hlist) {
> if (dev == n->dev && !memcmp(n->prim
Hello!
> Yes, it is logical because without multicast IPV6 cannot
> work correctly.
This is not quite true. IFF_BROADCAST is enough, it will work just
like IPv4.
Real troubles start only when interface is not IFF_BROADCAST and not
IFF_POINTOPOINT.
> IFF_MULTICAST flag seems potentially problem
Hello!
> @@ -346,8 +354,8 @@ struct neighbour *neigh_lookup(struct ne
>
> NEIGH_CACHE_STAT_INC(tbl, lookups);
>
> - read_lock_bh(&tbl->lock);
> - hlist_for_each_entry(n, tmp, &tbl->hash_buckets[hash_val], hlist) {
> + rcu_read_lock();
> + hlist_for_each_entry_rcu(n,
Hello!
> I'm thinking that David definitely has a point about having a usability
> problem, though. All other kind of tunnels have endpoint devices
> associated with them, and that would make all these kinds of problems go
> away,
Yes, when you deal with sane practical setups, this approach
Hello!
> What he's trying to accomplish doesn't sound all that weird,
Absolutely sane.
> does anyone have any other ideas?
The question is where is this host really?
If it is far far away and connected only via IPsec tunnel with destionation
of tunnel different of host address
ip ro add THEH
Hello!
> Isn't a socket freed until all skb are handled? In which case the limit on
> the number of open
> files limits the total memory usage? (Same as with streaming sockets?)
Alas. Number of closed sockets is not limited. Actually, it is limited
by sk_max_ack_backlog*max_files, which is a lot
Hello!
> >No way - timespec uses long.
>
> I must have missed that discussion. Please enlighten me in what regard
> using an opaque type with lower resolution is preferable to a type
> defined in POSIX for this sort of purpose.
Let me explain, as a person who did this mistake and deeply
regrets
Hello!
> > It is the only protection of commiting infinite amount of memory to a
> > socket.
>
> Doesn't the "if (atomic_read(&sk->sk_wmem_alloc) < sk->sk_sndbuf)" check in
> sock_alloc_send_pskb()
> limit things already?
Unfortunately, it does not. You can open a socket, send
something to a s
Hello!
> Either this, or it should be implemented correctly, which means poll needs
> to be fixed to also check for max_dgram_qlen,
Feel free to do this correctly. :-)
Deleting "wrong" code rarely helps.
It is the only protection of commiting infinite amount of memory to a socket.
Alexey
-
To u
ock(&fib_info_lock), and spin forever.
Signed-off-by: Alexey Kuznetsov <[EMAIL PROTECTED]>
---
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 4ea6c68..5dfdad5 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -159,7 +159,7 @@ void fre
Hello!
> send out any delayed ACKs when it is clear that the receiving process is
> waiting for more data?
It has just be done in tcp_cleanup_rbuf() a few lines before your chunk.
There is some somplex condition to be satisfied there and it is
impossible to relax it any further.
I do not know w
Hello!
> (application) containers. Performance aside, are there any reasons why
> this approach would be problematic for c/r?
This approach is just perfect for c/r.
Probably, this is the only approach when migration can be done
in a clean and self-consistent way.
Alexey
-
To unsubscribe from t
Hello!
> In one conversation with Alexey he told me there was some inspiration
> from pfkey in the semantics of it i.e processid.
Inspiration, but not a copy. :-)
Unlike pfkeyv2 it uses addressing usual for networking i.e.
struct sockaddr_nl.
Alexey
-
To unsubscribe from this list: send the lin
Hello!
> The netlink header pid is really akin to sadb_msg_pid from RFC 2367.
> IMHO it should always be zero if the kernel is the originator of the
> message.
No. Analogue of sadb_msg_pid is nladdr.nl_pid.
Netlink header pid is not originator of the message, but author of
the change. The notio
Hello!
> I still like existing way - it is much simpler (I hope :) to convince
> e1000 developers to fix driver's memory usage
e1000 is not a problem at all. It just has to use pages.
If it is going to use high order allocations, it will suck,
be it order 3 or 2.
> area (does MAX_TCP_HEADER eno
Hello!
> That wouldn't work if hard_header() ever expands the head. Fortunately
> hard_header() returns the length added even in case of an error so we
> can undo the absolute value returned.
Yes.
Or probably it is safer to undo to skb->nh. Even if hard_header
expands skb, skb->nh still remains
Hello!
> Some of these removals of current->pid will affect users such as quagga,
> zebra, vrrpd etc.
If they survived cleanup in IPv4, they definitely will not feel cleanup
in IPv6.
Thomas does great work, Jamal, do not worry. :-)
> IMO, I believe there is a strong case that can be made for e
Hello!
> e1000 will setup head/data/tail pointers to point to the area in the
> first sg page.
Maybe.
But I still hope this is not necessary, the driver should be able to do
at least primitive header splitting, in that case the header could
be inlined to skb.
Alternatively, header can be copied
Hello!
> So we do something like this:
Yes, exactly.
Actually, there was a function with similar functionality: rtnetlink_send().
net/sched/* used it, older net/ipv4/ still did this directly.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to
Hello!
> E1000 wants 16K buffers for jumbo MTU settings.
>
> The reason is that the chip can only handle power-of-2 buffer
> sizes, and next hop from 9K is 16K.
Let it use pages. Someone should start. :-)
High order allocations are disaster in any case.
> If we store raw kmalloc buffers, we c
Hello!
> Actually I think the only safe solution is to allocate a separate
> socket for multicast messages. In other words, if you want reliable
> unicast reception on a socket, don't bind it to a multicast group.
Yes, it was the point of my advocacy of NLM_F_ECHO. :-)
Alexey
-
To unsubscribe f
Hello!
> Makes sense, especially for auto generated handles. I've been listening
> to the notifications on a separate socket for this purpose.
That's... complicated. But cool. :-)
> It does make sense, the way it has been implemented if at all is
> creepy. Even worse, IPv6 is using current->pid
Hello!
> I get your point and I see the value. Unfortunately, probably due to
> lack of documentation, this feature isn't used by any applications I
> know of.
Well, tc was supposed to use it, but this did not happen and
it remained deficient.
> We even put in the hacks to make identification o
Hello!
> I'd be interested in any opinions on the above mentioned effect.
Everything is right, it is exactly how it works.
Well, use another qdisc, which counts in bytes rather than in frames
(f.e. bfifo)
Set sndbuf small enough.
And if sndbuf*#senders is still too large, you have to use fair
Hello!
>> management schemes and to just wrap SKB's around
>> arbitrary pieces of data.
+
> and something clever like a special page_offset encoding
> means "use data, not page".
But for what purpose do you plan to use it?
> The e1000 issue is just one example of this, another
What is this iss
Hello!
> Did I miss some way that multiple file objects can point to the
> same socket inode?
Absolutely prohibited. Always was.
Apparently, sock_fasync() was cloned from tty_fasync(), that's the only
reason why it is so creepy.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe
Hello!
> What's wrong with listening to the notification for that purpose?
Nothing! NLM_F_ECHO _is_ listening for notifications without subscription
to multicast groups and need to figure out what messages are yours.
But beyond this NLM_F_ECHO is totally subset of this.
Which still makes much mor
Hello!
> This patch handles NLM_F_ECHO in netlink_rcv_skb() to
> handle it in a central point. Most subsystems currently
> interpret NLM_F_ECHO as to just unicast events to the
> originator of the change while the real meaning of the
> flag is to echo the request.
Do not you think it is useless t
Hello!
> This fix goes against the old historical comments about UNIX98 semantics
> but without this fix SOCK_DGRAM is broken and useless. So either ANK's
> interpretation was incorect or UNIX98 standard was wrong.
Just found this reference to me. :-)
The comment migrated from tcp.c. It is only
Hello!
> During OpenVZ stress testing we found that UDP traffic with
> random src can generate too much excessive rt hash growing
> leading finally to OOM and kernel panics.
>
> It was found that for 4GB i686 system (having 1048576 total pages and
> 225280 normal zone pages) kernel allocates the
Hello!
> Do the semantics (I'm not talking about bugs) allow skb passed
> to dev->hard_header() (if defined)
No. dev->hard_header() should get enough of space, which is
dev->hard_header_len.
Actually, it is historical hole in design, inherited from ancient
times. Calling conventions of dev->hard
Hello!
> > Alexey, any suggestions on how to handle this kind of thing?
Device, which adds something at head must check for space.
Anyone, who adds something at head, must check.
Otherwise, it will remain buggy forever.
> What's wrong with my patch?
As I already said there is nothing wrong wit
Hello!
> It does seem weird that IP output won't pay attention to
Not so weird, actually.
The logic was:
Only initial skb allocation tries to reserve all the space
to avoid copies in the future.
All the rest of places just check, that there is enough space
for their immediate needs. If dev->ha
Hello!
> kernel thread takes 100% cpu (with preemption
Preemption, you tell... :-)
I begged you to spend 1 minute of your time to press ^Z. Did you?
Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at h
Hello!
> ip_output() ignores dev->hard_header_len
ip_output() worries about the space, which it needs.
If some place needs more, it is its problem to check.
To the moment where it is used, hard_header_len can even change.
It can be applied, but it does not change the fact, that those
placed whi
Hello!
On Thu, Jul 27, 2006 at 03:46:12PM +1000, Rusty Russell wrote:
> Of course, it means rewriting all the userspace tools, documentation,
> and creating a complete new infrastructure for connection tracking and
> NAT, but if that's what's required, then so be it.
That's what I love to hear. N
HellO!
> I like this. However, since the cloned skb is either discarded in case
> of error, or queued in which case the caller discards its reference right
> away, wouldn't it be simpler to just do this?
Well, if we wanted just to cheat those checking tools, it is nice.
But if we want clarity, i
Hello!
> Wouldn't it be better to have a consistent interface (skb always freed),
> and clone the skb if needed for deferred processing?
I think you mean this.
Note, it is real skb_clone(), not alloc_skb(). Equeued skb contains
the whole half-prepared netlink message plus room for the rest.
It c
Hello!
> Wouldn't it be better to have a consistent interface (skb always freed),
> and clone the skb if needed for deferred processing?
I am sorry, I misunderstood you. I absolutely agree. It is much better,
the variant which I suggested is a good sample of bad programming. :-)
Alexey
-
To uns
1 - 100 of 148 matches
Mail list logo