[
https://issues.apache.org/jira/browse/HBASE-28686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011860#comment-18011860
]
Vinayak Hegde commented on HBASE-28686:
---------------------------------------
[~rmdmattingly] Thanks for fixing the issue.
As part of https://issues.apache.org/jira/browse/HBASE-29484, we're updating
the backup and restore documentation to reflect the latest changes.
Since this Jira also touches the backup and restore components, could you
please review the changes and check if any updates to the documentation are
needed? If so, kindly create a sub-task under
https://issues.apache.org/jira/browse/HBASE-29484 detailing the required
documentation updates. You’re welcome to assign it to yourself and work on it,
or we’d be happy to take it up as well.
Thanks!
> MapReduceBackupCopyJob should support custom DistCp options
> -----------------------------------------------------------
>
> Key: HBASE-28686
> URL: https://issues.apache.org/jira/browse/HBASE-28686
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.6.0
> Reporter: Ray Mattingly
> Assignee: Ray Mattingly
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1
>
>
> h4. Problem
> The MapReduceBackupCopyJob class provides no means for updating DistCp job
> options. This means that you're stuck with defaults, which isn't always
> desirable. For example, my workplace would like the freedom to deviate from
> at least two DistCp defaults:
> # distcp.direct.write — we would like to set this to true, because writing
> and renaming tmp files is expensive in S3 (where we store our backups).
> # we would also like control over the number of mappers that DistCp will run
> h4. Proposed Solution
> It is not the prettiest solution, but I'm proposing that we support DistCp
> customizations via the given backup client configuration like
> [this.|https://github.com/HubSpot/hbase/compare/hubspot-2.6...HubSpot:hbase:backup-distcp-options]
> It's necessary to do this conf -> arg conversion because we still want to
> use [DistCp's run
> method|https://github.com/HubSpot/hadoop/blob/c4c25b0ea2be1c8bca31d86962597060b2630f62/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L134-L171],
> which expects args, so as to not change any error codes. Hadoop actually
> does something similar, but in the opposite direction — the DistCp job has
> logic to convert the args back to configurations (lol).
> Further, the DistCp API is really unfortunately designed for programmatic
> use, so it doesn't leave us great alternatives. For example, it doesn't
> matter what you pass in as DistCpOptions to the constructor if you use the
> run method, your options will be overwritten based on the args that you pass
> in. Alternatively, if you pass in the DistCpOptions in the constructor and
> use DistCp#execute or DistCp#createAndSubmitJob, then you get none of the
> error specificity!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)