Re: distcp from plain java program

Hendrik Haddorp Thu, 26 Apr 2018 04:35:46 -0700

Hi Gour,

I did but, the problem seems to have been the local execution. The localexecution uses only one thread, which is what I saw in the logs as well.So I ended up just doing my own copy using hadoop FileSystem APIs andusing multiple threads. That worked pretty well and allowed me tocontrol the order of the file copies.


regards,
Hendrik

On 26.04.2018 02:00, Gour Saha wrote:

Hendrik,
Did you try setting maxMaps to a higher number? The default is 20, so you might 
try setting it to a higher value.

-Gour

On 4/21/18, 7:01 AM, "Hendrik Haddorp" <[email protected]> wrote:

     Hi,

I'm trying to use distcp (org.apache.hadoop.tools.DistCp) out of a

     simple java program to copy files from HDFS to S3 storage. This works
     quite fine, except that it is very slow. Copying the files to the local
     disk is also not much faster. It seems like files are copied
     sequentially. My understanding was however that distcp would create map
     jobs that could be executed in parallel. Is there any configuration
     setting required to get the map jobs executed in parallel?

thanks,

     Hendrik

---------------------------------------------------------------------

     To unsubscribe, e-mail: [email protected]
     For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: distcp from plain java program

Reply via email to