Re: [PATCH] Kevent, network AIO system calls implementation.

Evgeniy Polyakov Thu, 16 Mar 2006 01:08:20 -0800

On Thu, Mar 16, 2006 at 12:36:22AM -0800, David S. Miller ([EMAIL PROTECTED]) 
wrote:
> From: Evgeniy Polyakov <[EMAIL PROTECTED]>
> Date: Sat, 18 Feb 2006 19:11:40 +0300
> 
> > Hello developers.
> > 
> > I'm pleased to announce combined patch of kevent and network AIO subsytems,
> > which are implemented using three system calls:
> >  * kevent_ctl(), which fully controls kevent subsystem.
> >    It allows to add/modify/remove kevents and waiting for either given
> >    number or at least one kevent is ready.
> >  * aio_send().
> >    This system call allows to asynchronously transfer userspace buffer
> >    to the remote host using zero-copy mechanism.
> >  * aio_recv().
> >    Asynchronous receiving support.
> > 
> > Next step is to implement aio_sendfile() support.
> > 
> > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]>
> 
> I did minor review today, the scheme is interesting, but here
> are some initial code review comments:


Thank you for your comments.

> 1) Please format to 80 columns.  I use a very generous 86 character
>    wide emacs window and much of your code wrapped lines :-)

Hmm, I will try to fit - I generally get this from all maintainers where
submit the code, but still can not change.

> 2) I do not see real need to enumeration for all of the KEVENT_*
>    constants, just use CPP defines.  You are setting them explicitly
>    anyways.
> 
> 3) Adding sk_kevent_flags to struct sock looks like real bloat, you
>    just have 1 or 2 flags, just allocate them to sk_flags.

Sure, it only requires two bits.

>    I also do not see why you need to obtain the socket lock around
>    set/clear bit.  You are using atomic bit operations and you are
>    not testing anything other than local variable state ('valbool').

If we clear the flag socket code can be in the middle of processing -
and next sock_async() check can fail where it is supposed to be true.
Actually this flag checks are always done under the lock, so they could
be done using usual logical operations.

> 4) I have 2 major problems with tcp_async_{send,recv}().
> 
>    a) Much duplication of existing tcp sendmsg/recvmsg code.
>       Consolidation and code sharing is needed.

Yes, especially receiving code - it is 99% what sendpage() does.

>    b) A large chunk of it is protocol agnostic, we can share this
>       logic such that support for protocols such as DCCP can be
>       made very simply.
> 
> 5) Why is the ordering between sk_stream_set_owner_r() and
>    __skb_queue_tail() important in this hunk?
> 
> @@ -3079,8 +3080,8 @@ queue_and_out:
>                                   !sk_stream_rmem_schedule(sk, skb))
>                                       goto drop;
>                       }
> -                     sk_stream_set_owner_r(skb, sk);
>                       __skb_queue_tail(&sk->sk_receive_queue, skb);
> +                     sk_stream_set_owner_r(skb, sk);
>               }
>               tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
>               if(skb->len)

I put kevent notification into sk_stream_set_owner_r(), so it was
logically correct to notify when data actually in the queue, but then
this was removed, and hunk just was not changed.

> 6) I disagree with turning off the prequeue just because the socket is
>    in async mode:
> 
> @@ -3819,7 +3820,7 @@ int tcp_rcv_established(struct sock *sk,
>                       if (tp->ucopy.task == current &&
>                           tp->copied_seq == tp->rcv_nxt &&
>                           len - tcp_header_len <= tp->ucopy.len &&
> -                         sock_owned_by_user(sk)) {
> +                         sock_owned_by_user(sk) && !sock_async(sk)) {
> 
>    If the user blocked on a socket operation and we can do prequeue
>    processing, directly copying to userspace, we should.
> 
>    Or are we zero-copy'ing here?
>
> 7) More prequeue bypassing when socket is in async mode:
> 
> +     if (sock_async(sk)) {
> +             spin_lock_bh(&sk->sk_lock.slock);
> +             ret = tcp_v4_do_rcv(sk, skb);
> +             spin_unlock_bh(&sk->sk_lock.slock);
> +     } else {
> +             bh_lock_sock(sk);
> +             if (!sock_owned_by_user(sk)) {
> +                     if (!tcp_prequeue(sk, skb))
> +                             ret = tcp_v4_do_rcv(sk, skb);
> +             } else
> +                     sk_add_backlog(sk, skb);
> +             bh_unlock_sock(sk);
> +     }
> 
> I really think #6 and #7 need justification, because if we do
> end up with net channels, they will go via the TCP prequeue
> mechanism.

Prequeue is turned off for async sockets, since process can be 
not scheduled when new data arrives or is being sent, and data is not
copied through usual datagram methods, but directly to/from pinned
userspace pages.

Net channels move processing into process' context, while async
operations in general does not mean that original process will even be
run when async completion happens.

-- 
        Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Kevent, network AIO system calls implementation.

Reply via email to