[
https://issues.apache.org/jira/browse/HADOOP-13489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419635#comment-15419635
]
Mingliang Liu commented on HADOOP-13489:
----------------------------------------
Thanks for reporting and working on this, [~tedyu].
The v1 patch checks the job not to be null value and job state to be
DistCpConstants.SUCCEEDED by directly calling {{isSuccessful()}}. My concerns
are:
# Basically the {{DistCp#run()}} checks the status of the job by exception
handling. All exceptions, if not caught yet, will be caught by the last generic
{{Exception}} block which returns DistCpConstants.UNKNOWN_ERROR. Is it possible
that the job is null while there is no exception? (I think here the job should
never be null)
#- If not, the {{if (job != null) {}} check is not needed, or better, state as
an assert.
#- If true, I think for a null job, we should still return
DistCpConstants.UNKNOWN_ERROR instead of DistCpConstants.SUCCESS.
# We have to respect the {{DistCpOptions#blocking}} option ({{-async}} from
command line parameters):
#- If the distcp option is blocking, {{DistCp#waitForJobCompletion()}} will be
called, which will throw an IOE in case the job is not successful.
{code}
/**
* Wait for the given job to complete.
* @param job the given mapreduce job that has already been submitted
*/
public void waitForJobCompletion(Job job) throws Exception {
assert job != null;
if (!job.waitForCompletion(true)) {
throw new IOException("DistCp failure: Job " + job.getJobID()
+ " has failed: " + job.getStatus().getFailureInfo());
}
}
{code}
So if the Job fails, DistCpConstants.SUCCESS is NOT be returned?
#- If the distcp option is non-blocking, {{waitForJobCompletion()}} will not be
called, in which case the distcp just returns after the job is submitted. The
internal job state can still be RUNNING instead of SUCCEEDED. What should
DistCp return in this case? I think we can simply return SUCCESS, as the
existing code does.
FWIW, the {{@return}} javadoc of the {{DistCp#run()}} method was wrong for
years.
> DistCp may incorrectly return success status when the underlying Job failed
> ---------------------------------------------------------------------------
>
> Key: HADOOP-13489
> URL: https://issues.apache.org/jira/browse/HADOOP-13489
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: HADOOP-13489.v1.patch, HADOOP-13489.v2.patch,
> TestIncrementalBackup-output.txt
>
>
> I was troubleshooting HBASE-14450 where at the end of BackupdistCp#execute(),
> distcp job was marked unsuccessful (BackupdistCp is a wrapper of DistCp).
> Yet in IncrementalTableBackupProcedure#incrementalCopy(), the return value
> from copyService.copy() was 0.
> Here is related code from DistCp:
> {code}
> try {
> execute();
> } catch (InvalidInputException e) {
> LOG.error("Invalid input: ", e);
> return DistCpConstants.INVALID_ARGUMENT;
> } catch (DuplicateFileException e) {
> LOG.error("Duplicate files in input path: ", e);
> return DistCpConstants.DUPLICATE_INPUT;
> } catch (AclsNotSupportedException e) {
> LOG.error("ACLs not supported on at least one file system: ", e);
> return DistCpConstants.ACLS_NOT_SUPPORTED;
> } catch (XAttrsNotSupportedException e) {
> LOG.error("XAttrs not supported on at least one file system: ", e);
> return DistCpConstants.XATTRS_NOT_SUPPORTED;
> } catch (Exception e) {
> LOG.error("Exception encountered ", e);
> return DistCpConstants.UNKNOWN_ERROR;
> }
> return DistCpConstants.SUCCESS;
> {code}
> We don't check whether the Job returned by execute() was successful.
> Even if the Job fails, DistCpConstants.SUCCESS is returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]