On Mon, Mar 12, 2007 at 11:14:29AM +0100, Claudio Jeker wrote:
> On Sun, Mar 11, 2007 at 11:43:03PM +0000, Jon Morby wrote:
> > I've noticed since updating to current from 4.0-current in January to
> > -current "now" that certain commands through bgpctl seem to just
> > hang .. (we've tried this with various snapshots every 2 weeks or so
> > since January) as well as trying a full build.
> >
> > If I do a bgpctl reload (100+ peers per production router) and then
> > within a minute or so do a "bgpctl show" then bgpctl's output just
> > hangs (I've left it for nearly an hour and nothing has appeared
> > beyond the headers)
> >
>
> Hmm. That's strange. Does it work after waiting a longer time?
> bgpctl show can block for some extended time (if the RDE is busy) but you
> should get a result back.
>
> > Also regardless of a reload, attempting to show rib nei a.b.c.d out
> > also just seems to hang
> >
> > bgpctl show rib nei 80.252.124.1 out
> > flags: * = Valid, > = Selected, I = via IBGP, A = Announced
> > origin: i = IGP, e = EGP, ? = Incomplete
> >
> > flags destination gateway lpref med aspath origin
> >
> > and no more output
> >
>
> You're right we have a polling issue here. I'm looking into it.
> In short, the RDE blocks and does not continue to process the show command
> until an update or some other message was received because the current
> batch of work did not queue any outgoing imsgs. The poll(2) timeout needs
> to be changed to 0 in this case.
>
... and here is the patch to fix the issue.
--
:wq Claudio
Index: rde.c
===================================================================
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.219
diff -u -p -r1.219 rde.c
--- rde.c 22 Feb 2007 08:34:18 -0000 1.219
+++ rde.c 12 Mar 2007 10:08:51 -0000
@@ -73,6 +73,7 @@ void rde_dump_prefix(struct ctl_show_r
void rde_dump_ctx_new(struct ctl_show_rib_request *, pid_t,
enum imsg_type);
void rde_dump_runner(void);
+int rde_dump_pending(void);
void rde_up_dump_upcall(struct pt_entry *, void *);
void rde_softreconfig_out(struct pt_entry *, void *);
@@ -151,7 +152,7 @@ rde_main(struct bgpd_config *config, str
struct filter_rule *f;
struct filter_set *set;
struct nexthop *nh;
- int i;
+ int i, timeout;
switch (pid = fork()) {
case -1:
@@ -239,6 +240,7 @@ rde_main(struct bgpd_config *config, str
}
while (rde_quit == 0) {
+ timeout = INFTIM;
bzero(pfd, sizeof(pfd));
pfd[PFD_PIPE_MAIN].fd = ibuf_main->fd;
pfd[PFD_PIPE_MAIN].events = POLLIN;
@@ -254,6 +256,8 @@ rde_main(struct bgpd_config *config, str
pfd[PFD_PIPE_SESSION_CTL].events = POLLIN;
if (ibuf_se_ctl->w.queued > 0)
pfd[PFD_PIPE_SESSION_CTL].events |= POLLOUT;
+ else if (rde_dump_pending())
+ timeout = 0;
i = 3;
if (mrt && mrt->queued) {
@@ -262,7 +266,7 @@ rde_main(struct bgpd_config *config, str
i++;
}
- if (poll(pfd, i, INFTIM) == -1) {
+ if (poll(pfd, i, timeout) == -1) {
if (errno != EINTR)
fatal("poll error");
continue;
@@ -1795,6 +1799,12 @@ rde_dump_runner(void)
if (ctx->ptc.done && ctx->req.af == AF_UNSPEC)
ctx->af = AF_INET6;
}
+}
+
+int
+rde_dump_pending(void)
+{
+ return (!TAILQ_EMPTY(&rde_dump_h));
}
/*