On Sat, 2006-28-01 at 20:49 +1100, Herbert Xu wrote: > On Fri, Jan 27, 2006 at 08:05:23AM -0500, jamal wrote:
> > +extern u32 sysctl_xfrm_aevent_on; > > I'd prefer for this to be automatically determined. Indeed, this is > a generic netlink problem. We want to be easily determine at run time > whether there are netlink sockets subscribed to a given multicast group. > You can reduce the problem to that, yes. And looking at Patrick's email on ctnetlink, it is a similar problem. The sysctl came after i saw the amount of events generated while testing. Here are some numbers for you on one of my tests: Initially i started with a default of replay threshold being 1/2 of each SAs replay window. I was able to connect to a simulated 100 IKE peers on a big xeon and send 1Kpps each direction. This resulted in an event every second packet, every SA, every direction ;-> Thats (1K/2)*100*2 which is 100K events/second. I believe ctnetlink would be even worse than this. > Here is an idea, in addition to mc_list, we keep a bit mask of the groups > that exist on mc_list. This bit mask can then be used to quickly determine > if a netlink skb needs to be generated or not. > > This mask would be updated at bind(2) time. Updating this is O(n) where > n is the number of members on mc_list. This should be fine since > netlink_broadcast itself which occurs much more often is also O(n). > Two things: - The sysctl does a little more than this in that even if someone joined, and the sysctl was not set then they get no events. I have not used it in that perspective myself but for some reason in my notes i have jotted down it as being useful. If i cant find a good reason then it is not needed ;-> - if we auto-discover based on groups being subscribed-to, then the lower we stop generating events the better; In this example at the point where the replay is being updated in the fast path as opposed to all the way when we want to do a sendmsg. This could be achieved if somehow the group being subscribed to automagically sets sysctl_xfrm_aevent_on and clears it when the last unsubscription happens - that would be fantastic. I will wait to see what Patrick posts > > +extern u32 sysctl_xfrm_aevent_etime; > > +extern u32 sysctl_xfrm_aevent_rseqth; > > Why do we need these defaults? I'd rather see these be removed and just have > the user-space KM always set the values (if it needs aevent). > This is mostly out of laziness when i started but has turned out to be useful. For ease of testing, i didnt want to set anything or make any changes to a KM. My initial attempt was to default to a time of 1 second and a replay threshold of 1/2 the replay window. I was frustrated using racoon to find it was by default setting the window to 4 ;-> (otoh, pluto sets it to a very high number). I also couldnt justify why 1 sec or 10 sec or 30 seconds as i kept changing them. For this reason i figured the admin would be the better person to make that decision and i picked the defaults based on what i was experimenting with. I could remove them if this doesnt sound like a good reason. > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > > index e12d0be..2ccf87a 100644 > > --- a/net/xfrm/xfrm_state.c > > +++ b/net/xfrm/xfrm_state.c > > > > @@ -62,6 +65,8 @@ static void xfrm_state_gc_destroy(struct > > { > > if (del_timer(&x->timer)) > > BUG(); > > + if (del_timer(&x->rtimer)) > > + BUG(); > > How about just using x->timer? > This part of the patch i left as is from Krisztian's original patch. i have another timer i was going to add later (an idle timeout to emulate a certain vendor when an SA is idle) and felt i could use rtimer for that as well. For this reason i left it alone since it just worked ;-> How expensive are timers these days? Lets handwave a moderately large number of SAs (100K). With this we would have 200K timers as opposed to 100K if we aggregated them. > > @@ -104,6 +109,20 @@ static inline unsigned long make_jiffies > > return secs*HZ; > > } > > > > +void xfrm_replay_notify(struct xfrm_state *, int); > > Duplicate prototype. > I am experimenting with stgit which gives you git and ability to stack patches - I noticed actually it fscked me in 2 other patches or i could be misusing it. I will fix this in the final version. What do you guys use to stack patches? [..] > > This seems to be counter-intuitive. Wouldn't it make more sense to > schedule a timer in the XFRM_REPLAY_UPDATE case, and not schedule one > in the XFRM_REPLAY_TIMEOUT case? Scheduling it in the XFRM_REPLAY_TIMEOUT > case means that you may be waking up every maxage jiffies even when > nothing at all is happening. While doing it the other way means that > you only schedule it when something has happened and we've suppressed > the event due to maxdiff. > This is part of Krisztian's original patch - he would be a better person to respond to if we can wake him up. Waking up when nothing is happening is OK, but waking up and generating events when nothing is happening is not. so the rescheduling is fine as far as i can see but what you describe is an optimization i.e knowing nothing has happened and never waking up is even better. Krisztian? > > @@ -805,8 +879,10 @@ void xfrm_replay_advance(struct xfrm_sta > > diff = x->replay.seq - seq; > > x->replay.bitmap |= (1U << diff); > > } > > + > > + if (sysctl_xfrm_aevent_on) > > + xfrm_replay_notify(x, XFRM_REPLAY_UPDATE); > > This construct occurs many times. How about putting it into an inline > function? > good idea > > } > > -EXPORT_SYMBOL(xfrm_replay_advance); > > Huh? > stgit. > > @@ -835,7 +911,7 @@ void km_state_notify(struct xfrm_state * > > EXPORT_SYMBOL(km_policy_notify); > > EXPORT_SYMBOL(km_state_notify); > > > > -static void km_state_expired(struct xfrm_state *x, int hard) > > +void km_state_expired(struct xfrm_state *x, int hard) > > { > > struct km_event c; > > This hunk should go into the SA expire patch. > again, stgit. I will fix all these in the eventual patch (theres 3 more in noticed in the rest of the patches) cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html