Hello!

>> management schemes and to just wrap SKB's around
>> arbitrary pieces of data.
+
> and something clever like a special page_offset encoding
> means "use data, not page".

But for what purpose do you plan to use it?


> The e1000 issue is just one example of this, another

What is this issue?


What's about aggregated tcp queue, I can guess you did not find place
where to add protocol headers, but cannot figure out how adding non-pagecache
references could help.

You would rather want more then one skb_shared_info(): at least two,
one is immutable, another is for headers.

I think Evgeniy's idea about inlining skb_shared_info to skb head
is promising and simple enough. All the point of shared skb_shared_info
was to make cloning fast. But it makes lots of sense to inline some short
vector inot skb head (and, probably, even a MAX_HEADER space _instead_
of space for fclone).

With aggregated tcp send queue, when transmitting a segment, you could
allocate new skb head with space for header and either take existing
skb_shared_info from queue, attach it to head and set offset/length.
Or, alternatively, set one or two of page pointers in array, inlined in head.
(F.e. in the case of AF_UNIX socket, mentioned by Evgeniy, we would keep data
in pages and attach it directly to skb head).

Cloning becomes more expensive, but who needs it cheap, if tcp does not?



Returning to "arbitrary pieces of data".

Page cache references in skb_shared_info are unique thing,
get_page()/page_cache_release() are enough to clone data.

But it is not enough even for such simple thing as splice().
It wants we remembered some strange "pipe_buffer", where each page
is wrapped together with some ops, flags and even pipe_inode_info :-),
and called some destructor, when it is released. First thought is that
it is insane: it does not respect page cache logic, requires we implemented
additional level of refcounting, abuses amount of information, which
have to be stored in skb beyond all the limits of sanity.

But the second thought is that something like this is required in any case.
At least we must report to someone when a page is not in use and
can be recycled. I think Evgeniy knows more about this, AIO has
the same issue. But this is simpler, because release callback can be done
not per fragment or even per-skb, but actually per-transaction.

One idea is to announce (some) skb_shared_info completely immutable,
force each layer who needs to add a header or to fragment to refer
to original skb_shared_info as whole, using for modifications
another skb_shared_info() or area inlined in skb head.
And if someone is not able to, he must reallocate all the pages.
In this case destructor/notification can be done not for fragment,
but for whole aggregated skb_shared_info. Seems, it will work both
with aggregated tcp queue and with udp.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to