On Thu, 4 Dec 2014, Nikolaus Rath wrote:

Shannon Dealy <de...@deatech.com> writes:
[snip]
Unfortunately Linux does not provide an easy way to distinguish between
an unavailable DNS server, and an un-resolvable host name. To
distinguish these cases, S3QL/Dugong attempts to resolve a number of
"test" hostnames. If these resolve, but the S3 hostname does not,
S3QL/Dugong concludes that this hostname is not resolvable and
terminates. Otherwise it assumes that the DNS server is currently not
reachable and retries.

Attempting to resolve hostnames on your system frequently fails
(sometimes 3 times in a row), and sometimes it's apparently sufficiently
flaky that

1. server-external-2014-11-21-deatech-com-s3ql.s3.amazonaws.com cannot
be resolved

2. Any of google.com, iana.org, or root-servers.net can be resolved

3. server-external-2014-11-21-deatech-com-s3ql.s3.amazonaws.com cannot
be resolved

in this order, and without any waiting times.


At this point, S3QL thus assumes that your bucket ceased to exist and
terminates.

To avoid this, you'll have to fix the DNS resolution issues on your
system. Maybe install a caching proxy nameserver like dnsmasq?

While I can try dnsmasq as a solution, I was not having the degree of problems I am seeing under the old version of s3ql, and this is not the only network I connect to where I am seeing these problems. This could simply mean one or more of:

 - the network(s) have become more unstable (certainly possible)
 - that S3QL's test doesn't work quite the same way as before
 - there is a bug somewhere (S3QL or python libraries - I had to upgrade
   python to install the new S3QL) that makes it appear that DNS is still
   broken after it recovers
 - or something else is going on.

I see in the log I sent you that a few failures were reported 10 minutes before S3QL gave up, but it is not clear to me if S3QL was able to resolve DNS names between these. Am I correct in assuming that there a timer as well as the three time in a row failure limit you mentioned? If not, there should be. Network failures often take a few minutes to resolve/recover so I would imagine the test should look something like:

   while failing and < 10 minutes (or possibly some configurable value)
      wait some short interval
      try again

It is certainly not unusual to see short network glitches (DNS, connection loss, routing problems) lasting a few seconds to (rarely) a minute here, the five to ten minute DNS or general outage that I would view as a minimum before S3QL gives up is very uncommon (I pretty much live on the network when I am awake, so I am pretty sure I would notice problems of this level). Of course, I may be wrong, perhaps I should set up some kind of network monitor to see what kind of failure intervals there are.

FWIW.

Shannon C. Dealy           |         DeaTech Research Inc.
de...@deatech.com          |    - Custom Software Development -
USA Phone: +1 800-467-5820 |    - Natural Building Instruction -
numbers  : +1 541-929-4089 |            www.deatech.com


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to