Not sure, what's your backup approach.  One option can be archiving[1] the
files which were done for yarn logs[2].
To Speed on this, you can write one mapreduce job for archiving the files.
Please refer to the Document for sample mapreduce[3].


1.https://hadoop.apache.org/docs/stable/hadoop-archives/HadoopArchives.html
2.
https://hadoop.apache.org/docs/stable/hadoop-archive-logs/HadoopArchiveLogs.html
3.
https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

On Sun, Oct 9, 2022 at 9:22 AM Ayush Saxena <[email protected]> wrote:

> Using DistCp is the only option AFAIK. Distcp does support webhdfs, then
> try playing with the number of mappers and so to tune it for better
> performance
>
> -Ayush
>
>
> On 09-Oct-2022, at 8:56 AM, Abhishek <[email protected]> wrote:
>
> 
> Hi,
> We want to backup large no of hadoop small files (~1mn) with webhdfs API
> We are getting a performance bottleneck here and it's taking days to back
> it up.
> Anyone know any solution where performance could be improved using any xml
> settings?
> This would really help us.
> v 3.1.1
>
> Appreciate your help !!
>
> --
>
>
>
>
>
>
>
>
>
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> *Abhishek...*
>
>

Reply via email to