[
https://issues.apache.org/jira/browse/HADOOP-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742369#comment-16742369
]
Thomas Marquardt commented on HADOOP-16044:
-------------------------------------------
ABFS has been retrying on UnknownHostException since it previewed because our
understanding is that this exception is thrown for transient name resolution
failures. Our retry policy last longer than the typical DNS TTL (or negative
cache) of 5 minutes, so the driver could recover and enable a long running task
to complete successfully. WASB also retries for these. I expect ADL retries
too, although have not confirmed.
Mostly we do this for status quo, I mean, it is less likely to cause a
regression if we keep the current behavior.
With that said, if you have evidence this is a bad design, we should change it.
I see that we do the opposite for S3, but I don't know what led to that
decision nor do I have a good sense for the behavior in the wild, so I don't
know what's best. Certainly retrying is not going to increase the recovery
time on the node in question.
> ABFS: Better exception handling of DNS errors followup
> ------------------------------------------------------
>
> Key: HADOOP-16044
> URL: https://issues.apache.org/jira/browse/HADOOP-16044
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Da Zhou
> Assignee: Da Zhou
> Priority: Major
> Attachments: HADOOP-16044-001.patch, HADOOP-16044-002.patch
>
>
> This is a follow up for HADOOP-15662 as the 001 patch of HADOOP-15662 is
> already committed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]