Maxim Dounin Wrote: ------------------------------------------------------- > Hello! > > On Tue, Sep 18, 2018 at 06:02:46AM -0400, domleb wrote: > > > While running a load test that injects 10k TPS across 3 Nginx > instances, we > > are seeing spikes of errors where Nginx returns HTTP 502 and logs > the > > message 'no live upstreams while connecting to upstream'. There are > no > > other errors logged e.g. connection errors. > > > > Also, we have a single upstream virtual IP (we use iptables to > balance load > > across the backend) and according to the docs the upstream should > never be > > marked as down in this case: > > > > 'If there is only a single server in a group, max_fails, > fail_timeout and > > slow_start parameters are ignored, and such a server will never be > > considered unavailable' > > > > Testing locally with our config confirms this and I cannot reproduce > the 'no > > live upstreams while connecting to upstream' message when simulating > > connection and read errors with a single upstream. > > > > To debug I tried enabling debug logs but under load that degraded > > performance too much. I also traced the worker process with strace > and > > didn't find any socket or other other errors during the 502 spike. > > > > I was able to create this issue on Nginx 1.12.2 and 1.15.3. > > > > So given that we don't see any source error and we have a single > upstream, > > I'm interested to know what other scenarios could result in a 502 > with the > > log message 'no live upstreams while connecting to upstream'? > > Could you please show the upstream configuration you are using? > > With a single server in the upstream block, "no live upstreams" > error may happen if: > > - the server is marked "down" in the configuration, or > - the server reached the max_conns limit. > > Also note that "a single server" does not apply to cases when > there is a single hostname which resolves to multiple IP address > (this defines multiple servers at once). > > -- > Maxim Dounin > http://mdounin.ru/ > _______________________________________________ > nginx mailing list > nginx@nginx.org > http://mailman.nginx.org/mailman/listinfo/nginx
I removed our max_conns limit and that resolved the issue - thanks for the help. I might be worth changing the log message in this case as I believe the upstream is still live and there are no other log messages to indicate what the problem is. Posted at Nginx Forum: https://forum.nginx.org/read.php?2,281255,281298#msg-281298 _______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx