ockaddr is the mereely the generic pointer cast
that is expected to be used for the common posix fucntions like
bind/connect etc.
> Also equally arguably, the rdma code could just use a "struct
> sockaddr_in6 for this use and avoid the gcc issue, couldn't it? It has
Yes, that would be the right solution.
--Sowmini
On (03/22/19 10:59), Soheil Hassas Yeganeh wrote:
>
> Add documentation to the tcp_ca_state enum, since this enum is
> exposed in uapi.
Acked-by: Sowmini Varadhan
rigger a perf-event notification based on the verdict from the filter.
The uspace component can use these perf-event notifications to either
read any state managed by the eBPF kernel module, or issue a TCP_INFO
netlink call if desired.
Patch 2 provides a simple example that shows how to use t
: Sowmini Varadhan
---
V2: inline call to sys_perf_event_open() following the style of existing
code in kselftests/bpf
tools/testing/selftests/bpf/Makefile |4 +-
tools/testing/selftests/bpf/test_tcpnotify.h | 19 ++
tools/testing/selftests/bpf/test_tcpnotify_kern.c | 95
This patch allows eBPF programs that use sock_ops to send
perf-based event notifications using bpf_perf_event_output()
Signed-off-by: Sowmini Varadhan
---
net/core/filter.c | 19 +++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/net/core/filter.c b/net/core
perf-event notifications to either
read any state managed by the eBPF kernel module, or issue a TCP_INFO
netlink call if desired.
Patch 2 provides a simple example that shows how to use this infra
(and also provides a test case for it)
Sowmini Varadhan (2):
bpf: add perf-event notificaton su
: Sowmini Varadhan
---
tools/testing/selftests/bpf/Makefile |4 +-
tools/testing/selftests/bpf/perf-sys.h| 74
tools/testing/selftests/bpf/test_tcpnotify.h | 19 ++
tools/testing/selftests/bpf/test_tcpnotify_kern.c | 95 +++
tools/testing
This patch allows eBPF programs that use sock_ops to send
perf-based event notifications using bpf_perf_event_output()
Signed-off-by: Sowmini Varadhan
---
net/core/filter.c | 19 +++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/net/core/filter.c b/net/core
info notification for an iperf connection if the number of
retransmits exceeds 16.
Sowmini Varadhan (3):
sock_diag: Refactor inet_sock_diag_destroy code
tcp: BPF_TCP_INFO_NOTIFY support
bpf: Added a sample for tcp_info_notify callback
include/linux/sock_diag.h | 18 +++---
i
rn status is used
by the caller to queue up a tcp_info notification for the application.
Signed-off-by: Sowmini Varadhan
---
include/net/tcp.h| 15 +--
include/uapi/linux/bpf.h |4
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/include/net/tcp.h b/inc
Simple Proof-Of-Concept test program for BPF_TCP_INFO_NOTIFY
(will move this to testing/selftests/net later)
Signed-off-by: Sowmini Varadhan
---
samples/bpf/Makefile |1 +
samples/bpf/tcp_notify_kern.c | 73 +
2 files changed, 74 insertions
We want to use the inet_sock_diag_destroy code to send notifications
for more types of TCP events than just socket_close(), so refactor
the code to allow this.
Signed-off-by: Sowmini Varadhan
---
include/linux/sock_diag.h | 18 +-
include/uapi/linux/sock_diag.h |2
but afaict most things in BPF today only operate on sk_buffs. How should
we use *BPF on something other than an sk_buff?
--Sowmini
e, BPF hook can be an alternate parallel mechanism.
sure and that make sense. though I hope we will explore those
alternate mechanisms too.
--Sowmini
that table, and walk it, instead of holding up other VRFS
sorry, could not resist my i-told-you-so moment :-P
--Sowmini
On (10/11/18 08:26), Stephen Hemminger wrote:
> You can do the something like this already with BPF socket filters.
> But writing BPF for multi-part messages is hard.
Indeed. And I was just experimenting with this for ARP just last week.
So to handle the caes of "ip neigh show a.b.c.d" without wal
o you, suggest you look at that first.
Meanwhile, how about waiting for Tushar's next patchset, where
you will have your selftests that are based on veth/netns
just like exising tests for XDP. vxlan etc. I strongly suggest
waiting for that.
And btw, it would have been very useful/courteous to help with
the RFC reviews to start with.
--Sowmini
re worried about)
Does that address your concern?
--Sowmini
xercise.. I suppose you can add example code in
sefltests for this, but asking for a "proper test" may be
a litte unrealistic here- a proper test needs proper hardware
in this case.
--Sowmini
On (09/10/18 17:16), Cong Wang wrote:
> >
> > On (09/10/18 16:51), Cong Wang wrote:
> > >
> > > __rds_create_bind_key(key, addr, port, scope_id);
> > > - rs = rhashtable_lookup_fast(&bind_hash_table, key, ht_parms);
> > > + rcu_read_lock();
> > > + rs = rhashtable_lookup(&
he same effect.
>
> I don't see any reason we should prefer synchronize_rcu() here.
Usually correctness (making sure all readers are done, before nuking a
data structure) is a little bit more important than perforamance, aka
"safety before speed" is what I've always been taught.
Clearly, your mileage varies. As you please.
--Sowmini
lease() would
ensure this. How do we ensure this with SOCK_RCU_FREE (or is the
intention to just reduce *some* of the syzbot failures)?
--Sowmini
than going back
to rwlock, instead of rcu)
--Sowmini
re it is immune to these issues..
--Sowmini
esp=aes_gcm_c-256-null.
Each patch has a technical description of the contents of the fix.
V2: added Fixes tag so that it can be backported to the stable trees.
Sowmini Varadhan (2):
xfrm: reset transport header back to network header after all input
transforms ahave been applied
xfrm
back to network header
only after the last transformation so that subsequent xfrms
can find the correct transport header.
Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Suggested-by: Steffen Klassert
Signed-off-by: Sowmini Varadhan
---
v2: added "Fixes" tag
ne
e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Sowmini Varadhan
---
v2: added "Fixes" tag
net/xfrm/xfrm_input.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index b89c9c7..be3
off-by: Sowmini Varadhan
---
net/xfrm/xfrm_input.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index b89c9c7..be3520e 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -458,6 +458,7 @@ int xfrm_input(struct s
back to network header
only after the last transformation so that subsequent xfrms
can find the correct transport header.
Suggested-by: Steffen Klassert
Signed-off-by: Sowmini Varadhan
---
net/ipv4/xfrm4_input.c |1 +
net/ipv4/xfrm4_mode_transport.c |4 +---
net/ipv6/xfrm6_input.c
esp=aes_gcm_c-256-null.
Each patch has a technical description of the contents of the fix.
Sowmini Varadhan (2):
xfrm: reset transport header back to network header after all input
transforms ahave been applied
xfrm: reset crypto_done when iterating over multiple input xfrms
net/ipv4
On (08/21/18 14:05), Yue Haibing wrote:
> Remove duplicated include.
>
> Signed-off-by: Yue Haibing
Acked-by: Sowmini Varadhan
d not happen with the code as it exists today.
but there is a valid lock hierachy violation here, and
imho it's a good idea to get that cleaned up.
It also avoids needlessly holding down the rs_recv_lock
when doing an rds_inc_put.
--Sowmini
the refcnt on the messages
in the tmp_list (potentially resulting in rds_message_purge())
after dropping the rs_recv_lock.
The same lock hierarchy violation also exists in rds_still_queued()
and should be avoided in a similar manner
Signed-off-by: Sowmini Varadhan
Reported-by: syzbot+52140d69ac6dc6b92
ot;Structurally dead code")
> Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support")
> Signed-off-by: Gustavo A. R. Silva
Acked-by: Sowmini Varadhan
es not change existing behavior. And doing what
> you mentioned will change existing behavior and break apps.
thank you.
--Sowmini
al that info via
the optlen. (And the reason for this inconsistency is that you dont
want to deal with the user->kernel copy in the same way?)
--Sowmini
le less odd (I've already explained
to you why RDS-over-UDP does not make much practical sense for the RDS
use-cases we anticipate). YMMV.
Thanks,
--Sowmini
On (07/06/18 23:08), Ka-Cheong Poon wrote:
>
> As mentioned in a previous mail, it is unclear why the
> port number is transport specific. Most Internet services
> use the same port number running over TCP/UDP as shown
> in the IANA database. And the IANA RDS registration is
> the same. What is
the comment is interesting.
> > Also, while you are there, s/exisiting/existing, please.
>
>
> OK, with change that.
Wonderful.
For the rest, I repeat: Oracle Clusters are using UDP/IPV6 today
(with no RDS). You need feature compat with UDP for that reason.
--Sowmini
it in IB specific header files.
Santosh, David, I have to NACK this if it is not changed.
--Sowmini
ode. Please make sure to cc me in follow-ups to this thread.
Thank you.
--Sowmini
ts absence is expected)
Please have a look, thanks.
--Sowmini
t depends on how you set up your DNS.
It seems like this is all about "I dont want to deal with this
now", so I dont want to continue this discussion which is really
going nowhere.
Thanks
--Sowmini
On (06/26/18 10:53), Sowmini Varadhan wrote:
> Date: Tue, 26 Jun 2018 10:53:23 -0400
> From: Sowmini Varadhan
> To: David Miller
> Cc: netdev@vger.kernel.org, rds-de...@oss.oracle.com
> Subject: Re: [rds-devel] [PATCH net-next] rds: clean up loopback
>
> and just to a
by email?
the last time I asked this question, the answer was a pointer to
https://groups.google.com/forum/#!msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ
Thanks
--Sowmini
therefore did not target net) is
official confirmation that the syzbot failures are root-caused to the
absence of this patch (since there is no reproducer for many of these,
and no crash dumps available from syzbot).
--Sowmini
y)
https://www.spinics.net/lists/linux-rdma/msg66020.html
as I understand it, if there is no reproducer, you cannot really
have a pass/fail test to confirm the fix.
--Sowmini
nd then backport to earlier kernels (if needed)..
--Sowmini
, and maybe create another socket, and bind it to link-local"
You're not doing this for IPv4 and RDS today (you dont have to do this
for UDP, afaik)
This is especially true if "X" is a hostname that got resovled using DNS
> BTW, if it is really > needed, it can be added in future.
shrug. You are introducing a new error return.
--Sowmini
On (06/26/18 13:30), Ka-Cheong Poon wrote:
>
> My answer to this is that if a socket is not bound to a link
> local address (meaning it is bound to a non-link local address)
> and it is used to send to a link local peer, I think it should
> fail.
Hmm, I'm not sure I agree. I dont think this is fo
v6 support in rds_connect?
>
>
> Oops, I missed this when I ported the internal version to the
> net-next version. Will add it back.
Ok
--Sowmini
k-local, we need a conn with the daddr's
scopeid")
Also, why is there no IPv6 support in rds_connect?
(still looking through the rds-tcp changes, but wanted to get these
questions clarified first).
--Sowmini
On (06/25/18 06:41), Sowmini Varadhan wrote:
:
> Add the changes aligned with the changes from
> commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
> netns/module teardown and rds connection/workq management") for
> rds_loop_transport
FWIW, I am opt
the changes from
commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management") for
rds_loop_transport
Acked-by: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
---
net/rds/connection.c | 11 +-
net/
nics.net/lists/netdev/msg475074.html for earlier
discussion thread)
--Sowmini
you need some type of synchronization (either
through mutex, or some atomic flag in the rs or similar) to make
sure rds_bind() and rds_ib_get_mr() are mutually exclusive.
--Sowmini
The content on the wire should be the same.
I'm sorry that's not how I interpret Willem's email below
(and maybe I misunderstood)
the following taken from https://www.spinics.net/lists/netdev/msg496150.html
Sowmini> If yes, how will the recvmsg differentiate between the case
ing/drops- you may well end up just reinventing IP
frag/re-assembly when you are done (with just the slight improvement
that each "fragment" has a full UDP header, so it has a better shot
at ECMP and RSS).
--Sowmini
ifferentiate between the case
(2000 byte message followed by 512 byte message) and
(1472 byte message, 526 byte message, then 512 byte message),
in other words, how are UDP message boundary semantics preserved?
--Sowmini
unts for the various L2/L3 etc headers)
--Sowmini
like
the WARN_ONs in that commit are not even being triggered).
We've not been able to reproduce this issues, and without
a crash dump (or some hint of other threads that were running
at the time of the problem) are working on figuring out
the root-cause by code-inspection.
--Sowmini
eters,this/parameters, this/
>
> Well, not part of your commit.
As above.
>
>
> > * function resets the RDS connections in that netns so that we can
>
> Two double spaces incidents above
>
> Not part of your commit
As above.
Thanks much.
--Sowmini
netdevice notifiers and
refactors all the code needed to dismantle rds_tcp state
into a ->exit callback for the pernet_operations used with
register_pernet_device().
Signed-off-by: Sowmini Varadhan
---
net/rds/tcp.c | 93 ++---
1 files changed,
missed.. no easy answer here, I am afraid.
--Sowmini
er doing additional self-review/testing.
Please also take a look, if you can, to see if I missed something.
Thanks for the input,
--Sowmini
---patch follows
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 08ea9cd..87c2643 100644
--- a/net
On (03/17/18 10:15), Sowmini Varadhan wrote:
> To solve the scaling problem why not just have a well-defined
> callback to modules when devices are quiesced, instead of
> overloading the pernet_device registration in this obscure way?
I thought about this a bit, and maybe I missed your
I spent a long time staring at both v1 and v2 of your patch.
I understand the overall goal, but I am afraid to say that these
patches are complete hacks.
I was trying to understand why patchv1 blows with a null rtn in
rds_tcp_init_net, but v2 does not, and the analysis is ugly.
I'm going to
2 times, that needs some comments to
provide guidance for other subsystems. e.g., I found the
large block comment in net-namespace.h very helpful, so lets
please clearly document what and why and when this should
be used.
--Sowmini
aying there are scaling constraints on subsystems
that register for netdevice handlers. The disturbing part
of that is that it does not scale.
Thanks.
--Sowmini
the network interfaces have been taken down (loopback is
the last one) we know there are no more packets coming in
and out, so it is safe to dismantle all kernel sockets
created by rds-tcp.
Hope that helps.
--Sowmini
no a problem to do that.
Please share your patch, I can review it and maybe help to test
it..
As I was trying to say in my RFC, I am quite open to ways to make
this cleanup more obvious
--Sowmini
On (03/16/18 15:38), Kirill Tkhai wrote:
>
> 467fa15356acf by Sowmini Varadhan added NETDEV_UNREGISTER_FINAL dependence
> with the commentary:
>
> /* rds-tcp registers as a pernet subys, so the ->exit will only
>* get invoked after network acitivity has
use rds_destroy_pending() correctly.
Reported-by: syzbot+c68e51bb5e699d3f8...@syzkaller.appspotmail.com
Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management")
Signed-off-by: Sowmini Varadhan
---
net
o look into this and fix it later.
> Hard to understand why RDS is messing with hard irqs really.
some of it comes from the rds_rdma history: some parts of
the common rds and rds_rdma module get called in various
driver contexts for infiniband.
--Sowmini
On (03/11/18 18:03), Colin King wrote:
> From: Colin Ian King
>
> Functions rds_info_from_znotifier and rds_message_zcopy_from_user are
> local to the source and do not need to be in global scope, so make them
> static.
the rds_message_zcopy_from_user warning was already flagged by kbuild-robot
On (03/11/18 17:27), Colin King wrote:
> Variable sg_off is assigned a value but it is never read, hence it is
> redundant and can be removed.
>
Acked-by: Sowmini Varadhan
On (03/08/18 18:56), kbuild test robot wrote:
>
> Fixes: d40a126b16ea ("rds: refactor zcopy code into
> rds_message_zcopy_from_user")
> Signed-off-by: Fengguang Wu
Acked-by: Sowmini Varadhan
(do I need to separately submit a non-RFC patch for this?)
On (03/07/18 09:40), Jesus Sanchez-Palencia wrote:
> Fix the SO_ZEROCOPY switch case on sock_setsockopt() avoiding the
> ret values to be overwritten by the one set on the default case.
Acked-by: Sowmini Varadhan
Move the large block of code predicated on zcopy from
rds_message_copy_from_user into a new function,
rds_message_zcopy_from_user()
Signed-off-by: Sowmini Varadhan
---
net/rds/message.c | 108 +---
1 files changed, 60 insertions(+), 48 deletions
)
Sowmini Varadhan (2):
rds: refactor zcopy code into rds_message_zcopy_from_user
rds: use list structure to track information for zerocopy completion
notification
okie_queue by
a simpler list that results in a smaller memory footprint as well
as more efficient memory_allocation time.
Signed-off-by: Sowmini Varadhan
---
net/rds/af_rds.c |6 ++--
net/rds/message.c | 77 +---
net/rds/rds.h
onfigured as a kernel module.
Acked-by: Sowmini Varadhan
By moving the ops assignment after
> the ops->accept() call, we save increasing the refcnt in
> case the ops->accept() fails. Otherwise, the __module_get()
> needs to be moved before ops->accept() to handle this failure
> case.
I see, thanks for clarification.
It may be helpful to have some comment in there, in case some other
module trips on something similar in the future.
--Sowmini
new_sock->ops = sock->ops;
How is this delta relevant to the commit comment? Seems unrelated?
--Sowmini
PF_RDS sockets pass up cookies for zerocopy completion as ancillary
data. Update msg_zerocopy to reap this information.
Signed-off-by: Sowmini Varadhan
Acked-by: Willem de Bruijn
Acked-by: Santosh Shilimkar
---
v2: receive zerocopy completion notification as POLLIN
v3: drop ncookies arg in
In preparation for optimized reception of zerocopy completion,
revert the Rx side changes introduced by Commit dfb8434b0a94
("selftests/net: add zerocopy support for PF_RDS test case")
Signed-off-by: Sowmini Varadhan
Acked-by: Willem de Bruijn
Acked-by: Santosh Shilimkar
---
v2:
remove the sk_errror_queue related paths in
RDS.
Both of these goals are implemented in this series.
v2: removed sk_error_queue support
v3: incorporated additional code review comments (details in each patch)
Sowmini Varadhan (3):
selftests/net: revert the zerocopy Rx path for PF_RDS
rds: deliver
es support for zerocopy completion notification on
MSG_ERRQUEUE for PF_RDS sockets.
Signed-off-by: Sowmini Varadhan
Acked-by: Willem de Bruijn
Acked-by: Santosh Shilimkar
---
v2: remove sk_error_queue path; lot of cautionary checks rds_recvmsg_zcookie()
and callers to make sure we dont remo
ts/netdev/msg485424.html
I resent my patch a few minutes ago, but I suspect I may
now be hitting this well-known patchwork bug:
https://www.spinics.net/lists/sparclinux/msg13787.html
Do I need to do something?
--Sowmini
On (02/27/18 11:49), David Miller wrote:
> > Do I need to resend?
>
> Yes, see my other email.
do we need to resend patches not showing up in patchwork?
I recall seeing email about picking things manually from inbox
but missed this.
--Sowmini
code that delivers notifications on sk_error_queue.
This patch series removes the sk_error_queue support to the
smatch warning is not applicable after this patch.
on a different note, for some odd reason I'm not seeing this patch series
on the patch queue, though its showing up in the archives.
--Sowmini
In preparation for optimized reception of zerocopy completion,
revert the Rx side changes introduced by Commit dfb8434b0a94
("selftests/net: add zerocopy support for PF_RDS test case")
Signed-off-by: Sowmini Varadhan
---
v2: prepare to remove sk_error_queue based path; remove recvmsg
PF_RDS sockets pass up cookies for zerocopy completion as ancillary
data. Update msg_zerocopy to reap this information.
Signed-off-by: Sowmini Varadhan
---
v2: receive zerocopy completion notification as POLLIN
v3: drop ncookies arg in do_process_zerocopy_cookies; Reverse christmas
tree
es support for zerocopy completion notification on
MSG_ERRQUEUE for PF_RDS sockets.
Signed-off-by: Sowmini Varadhan
---
v2: remove sk_error_queue path; lot of cautionary checks rds_recvmsg_zcookie()
and callers to make sure we dont remove cookies from the queue and then
fail to pass it up to
sk_error_queue support
v3: incorporated additional code review comments (details in each patch)
Sowmini Varadhan (3):
selftests/net: revert the zerocopy Rx path for PF_RDS
rds: deliver zerocopy completion notification with data
selftests/net: reap zerocopy completions passed up as ancillary data
On (02/25/18 10:56), Willem de Bruijn wrote:
> > @@ -91,22 +85,19 @@ static void rds_rm_zerocopy_callback(struct rds_sock
> > *rs,
> > spin_unlock_irqrestore(&q->lock, flags);
> > mm_unaccount_pinned_pages(&znotif->z_mmp);
> > consume_skb(rds_skb_fro
pointed out that socket functions block
if sk_err is non-zero, thus if the RDS code does not plan/need
to use sk_error_queue path for completion notification, it
is preferable to remove the sk_errror_queue related paths in
RDS.
Both of these goals are implemented in this series.
Sowmini Varadhan (3
PF_RDS sockets pass up cookies for zerocopy completion as ancillary
data. Update msg_zerocopy to reap this information.
Signed-off-by: Sowmini Varadhan
---
v2: receive zerocopy completion notification as POLLIN
tools/testing/selftests/net/msg_zerocopy.c | 60
1
In preparation for optimized reception of zerocopy completion,
revert the Rx side changes introduced by Commit dfb8434b0a94
("selftests/net: add zerocopy support for PF_RDS test case")
Signed-off-by: Sowmini Varadhan
---
v2: prepare to remove sk_error_queue based path; remove recvmsg
es support for zerocopy completion notification on
MSG_ERRQUEUE for PF_RDS sockets.
Signed-off-by: Sowmini Varadhan
---
v2: remove sk_error_queue path; lot of cautionary checks rds_recvmsg_zcookie()
and callers to make sure we dont remove cookies from the queue and then
fail to pass it up
1 - 100 of 633 matches
Mail list logo