Hi all, here's a description of my setup, and of the problem:

- I am using Mongrel2 as a HTTP "collection" layer (getting signals),
behind a load balancer.
- Each machine is running 8 Mongrel 2 processes (ports 8081 - 8089).
- Each Mongrel server connects (one-to-one) to a 0MQ handler locally that
pushes the data further in the system, after some mangling / cleanup

Starting from about 2 weeks ago, I started getting some strange behavior,
meaning that mongrel processes die, with the following message in the log
(some context lines before it as well):

---
Mon, 28 Sep 2015 23:28:46 GMT [WARN] (src/register.c:351: errno: None)
Killed 1 connections according to min_ping: 120, min_write_rate: 300,
min_read_rate: 300
Mon, 28 Sep 2015 23:28:47 GMT [ERROR] (src/task/net.c:228: errno: Resource
temporarily unavailable) Failed waiting on non-block accept.
Mon, 28 Sep 2015 23:28:47 GMT [ERROR] (src/server.c:467: errno: None)
Failed to accept on listening socket.
Mon, 28 Sep 2015 23:28:47 GMT [ERROR] (src/server.c:500: errno: None)
Server accept failed, attempting to clear out dead weight: 1
Mon, 28 Sep 2015 23:28:54 GMT [ERROR] (src/task/net.c:228: errno: Resource
temporarily unavailable) Failed waiting on non-block accept.
Mon, 28 Sep 2015 23:28:54 GMT [ERROR] (src/server.c:467: errno: None)
Failed to accept on listening socket.
Mon, 28 Sep 2015 23:28:54 GMT [ERROR] (src/server.c:500: errno: None)
Server accept failed, attempting to clear out dead weight: 1
Mon, 28 Sep 2015 23:28:54 GMT [WARN] (src/register.c:351: errno: None)
Killed 3 connections according to min_ping: 120, min_write_rate: 300,
min_read_rate: 300
Mon, 28 Sep 2015 23:28:55 GMT [ERROR] (src/task/net.c:228: errno: None)
Failed waiting on non-block accept.
Mon, 28 Sep 2015 23:28:55 GMT [ERROR] (src/server.c:467: errno: None)
Failed to accept on listening socket.
Mon, 28 Sep 2015 23:28:55 GMT [ERROR] (src/server.c:500: errno: None)
Server accept failed, attempting to clear out dead weight: 1
Mon, 28 Sep 2015 23:29:02 GMT [ERROR] (src/task/net.c:228: errno: None)
Failed waiting on non-block accept.
Mon, 28 Sep 2015 23:29:02 GMT [ERROR] (src/server.c:467: errno: None)
Failed to accept on listening socket.
Mon, 28 Sep 2015 23:29:02 GMT [ERROR] (src/server.c:500: errno: None)
Server accept failed, attempting to clear out dead weight: 1
Mon, 28 Sep 2015 23:29:02 GMT [ERROR] (src/superpoll.c:230: errno: Invalid
argument) zmq_poll failed.
Mon, 28 Sep 2015 23:29:02 GMT [ERROR] (src/task/fd.c:128: errno: None)
SuperPoll failure, aborting.
---

The error is surprising because:

1. We have been running this setup for about 2 years, and we have never
seen this.
2. It might be related to more traffic coming in, but it is not really
visible in our telemetry
3. We have not upgraded any components directly affecting Mongrel (Kernel,
0MQ, Mongrel etc)

Questions would be:

1. Has anyone seen this before ?
2. Is it related to some issue in the handler process ? Or the web
listening stuff
3. Any pointers on where to start debugging ?

Mongrel configuration looks like this:

---
server8081 = Server(
  uuid = "mongrel-XXX-8081",
  name = "server8081", chroot = "./server8081", port = 8081,
  default_host = "host",
  hosts = [
    Host(name="host",
      routes={
        "/": Handler(
          send_spec="tcp://127.0.0.1:20001",
          send_ident="hcd01-20001-21001",
          recv_spec="tcp://127.0.0.1:21001",
          recv_ident="hcd01-20001-21001"
          ),
        "/settings/" : static_dir
      }
    )
  ],
  access_log = "/logs/server8081-access.log",
  error_log  = "/logs/server8081-error.log",
  pid_file   = "/run/mongrel2-server8081.pid"
)
settings = {
  "zeromq.threads": 1,
  "limits.content_length": 128,
  "disable.access_logging" : 1,
  "superpoll.max_fd": 65535
  #,"superpoll.hot_dividend": 2
}
---

​Any help is appreciated

Thank you,
  Andrei​

-- 
/A

Reply via email to