Not sure, what's your backup approach. One option can be archiving[1] the files which were done for yarn logs[2]. To Speed on this, you can write one mapreduce job for archiving the files. Please refer to the Document for sample mapreduce[3].
1.https://hadoop.apache.org/docs/stable/hadoop-archives/HadoopArchives.html 2. https://hadoop.apache.org/docs/stable/hadoop-archive-logs/HadoopArchiveLogs.html 3. https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html On Sun, Oct 9, 2022 at 9:22 AM Ayush Saxena <[email protected]> wrote: > Using DistCp is the only option AFAIK. Distcp does support webhdfs, then > try playing with the number of mappers and so to tune it for better > performance > > -Ayush > > > On 09-Oct-2022, at 8:56 AM, Abhishek <[email protected]> wrote: > > > Hi, > We want to backup large no of hadoop small files (~1mn) with webhdfs API > We are getting a performance bottleneck here and it's taking days to back > it up. > Anyone know any solution where performance could be improved using any xml > settings? > This would really help us. > v 3.1.1 > > Appreciate your help !! > > -- > > > > > > > > > > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > *Abhishek...* > >
