On Thu, 4 Dec 2014, Nikolaus Rath wrote:
Shannon Dealy <de...@deatech.com> writes:
[snip]
Unfortunately Linux does not provide an easy way to distinguish between
an unavailable DNS server, and an un-resolvable host name. To
distinguish these cases, S3QL/Dugong attempts to resolve a number of
"test" hostnames. If these resolve, but the S3 hostname does not,
S3QL/Dugong concludes that this hostname is not resolvable and
terminates. Otherwise it assumes that the DNS server is currently not
reachable and retries.
Attempting to resolve hostnames on your system frequently fails
(sometimes 3 times in a row), and sometimes it's apparently sufficiently
flaky that
1. server-external-2014-11-21-deatech-com-s3ql.s3.amazonaws.com cannot
be resolved
2. Any of google.com, iana.org, or root-servers.net can be resolved
3. server-external-2014-11-21-deatech-com-s3ql.s3.amazonaws.com cannot
be resolved
in this order, and without any waiting times.
At this point, S3QL thus assumes that your bucket ceased to exist and
terminates.
To avoid this, you'll have to fix the DNS resolution issues on your
system. Maybe install a caching proxy nameserver like dnsmasq?
While I can try dnsmasq as a solution, I was not having the degree of
problems I am seeing under the old version of s3ql, and this is not the
only network I connect to where I am seeing these problems. This could
simply mean one or more of:
- the network(s) have become more unstable (certainly possible)
- that S3QL's test doesn't work quite the same way as before
- there is a bug somewhere (S3QL or python libraries - I had to upgrade
python to install the new S3QL) that makes it appear that DNS is still
broken after it recovers
- or something else is going on.
I see in the log I sent you that a few failures were reported 10 minutes
before S3QL gave up, but it is not clear to me if S3QL was able to resolve
DNS names between these. Am I correct in assuming that there a timer as
well as the three time in a row failure limit you mentioned? If not,
there should be. Network failures often take a few minutes to
resolve/recover so I would imagine the test should look something like:
while failing and < 10 minutes (or possibly some configurable value)
wait some short interval
try again
It is certainly not unusual to see short network glitches (DNS,
connection loss, routing problems) lasting a few seconds to (rarely) a
minute here, the five to ten minute DNS or general outage that I would
view as a minimum before S3QL gives up is very uncommon (I pretty much
live on the network when I am awake, so I am pretty sure I would notice
problems of this level). Of course, I may be wrong, perhaps I should set
up some kind of network monitor to see what kind of failure intervals
there are.
FWIW.
Shannon C. Dealy | DeaTech Research Inc.
de...@deatech.com | - Custom Software Development -
USA Phone: +1 800-467-5820 | - Natural Building Instruction -
numbers : +1 541-929-4089 | www.deatech.com
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org