On Fri, 5 Dec 2014, Nikolaus Rath wrote:

Shannon Dealy <de...@deatech.com> writes:
[snip]
Given the senario you were trying to fix with the change, perhaps a
better approach would be to fail if initial resolution fails, but if
the initial resolution succeeds, then the end point can reasonably be
assumed to exist and future failures should keep retrying, at least
for a substantial (possibly configurable) period of time.  This
provides the immediate feedback for manual or scripting related
interactions, but once the file system is mounted focuses on
maintaining/recovering the connection.

Yes, that would be the best solution. It's just ugly to code, either way
you to pass a "do_not_fail" parameter all the way from the main function
to the low level network routines, or you have to do the first lookup on
a higher level (which then needs to know details about all the storage
backends).

I would suggest that it should pass a timeout value rather than a "do_not_fail". This would give the application level code the greatest flexibility allowing for both immediate and short term failure settings as well as an effective "never fail" by just passing a ridiculously large value.

Personally, I would love it if I could simply keep the file system
mounted at all times, even when there is no network link, so that when
there is a connection I can simply start using it, and when the
connection goes away (even for a day or two), everything blocks and
simply picks up where it left off when the connection returns.

Well, that should actually work already. When there is no network
connection, s3ql retries for a long amount. The problem only arises if
there is partial connectivity.

I tried it on the older version of the code I was using and it never recovered (not sure why). Don't know on the current version.

Had another filesystem crash last night and while I can't say the exact series of events from the perspective of the file system, it was a complete network failure that crashed it (not just DNS). This would imply that perhaps the test is too sensitive right at the boundary of a network failure (perhaps some packets get through, but most don't) and needs to retry over a longer period of time before deciding if the failure is network or DNS.

Upon further reflection:

I looked over your dugong code and have given some thought to what little I know of the local network topology, and my guess is that your test for live DNS will always decide that DNS is up at my location whenever the network connection fails (though it is just a guess). The ISP appears to be in another country, and their local network in this building feeds about 500 rooms. It is likely they are using a local DNS caching server (for the building, city or country, doesn't really matter which) which responds from its cache with anything it knows about, and forwards all other requests up to the ISP's main servers. If that is the case, any time the network gets cut off between the local caching DNS server and the primary DNS servers, the test will fail because google.com will always resolve (since everyone uses it), but www.iana.org and C.root-servers.org will not since most internet users never have reason to do a direct lookup on the later two domains, and any recursive lookups of these domains as a result of a local query will always happen and be stored only at the primary server.

Based on this, I would suggest that a more robust test would be to declare a DNS failure if any of the three lookups fail (after the host lookup previously failed). After all, a failure on any of these lookups would at least suggest a serious problem with the internet which is a reasonably likely cause of the initial host lookup failure.

FWIW.

Shannon C. Dealy           |         DeaTech Research Inc.
de...@deatech.com          |    - Custom Software Development -
USA Phone: +1 800-467-5820 |    - Natural Building Instruction -
numbers  : +1 541-929-4089 |            www.deatech.com


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to