Hello, You could try this jar which I found link to from one of the amazon pages.
s3cmd get s3://eu-west-1.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar s3dist.jar copies via mapreduce to s3 and back . If you cluster has N number of reducers available, you can : hadoop jar s3distcp.jar -D mapred.reduce.tasks=N --src s3://lame/foo --dest hdfs:///user/hadoop/lamefoo/ I would run it in a screen session. On 4 Sep 2012, at 21:07, Soulghost wrote: > > Hello guys > > I have a problem using the DistCp to transfer a large file from s3 to HDFS > cluster, whenever I tried to make the copy, I only saw processing work and > memory usage in one of the nodes, not in all of them, I don't know if this > is the proper behaviour of this or if it is a configuration problem. If I > make the transfer of multiple files each node handles a single file at the > same time, I understand that this transfer would be in parallel but it > doesn't seems like that. > > I am using 0.20.2 distribution for hadoop in a two Ec2Instances cluster, I > was hoping that any of you have an idea of how it works distCp and which > properties could I tweak to improve the transfer rate that is currently in > 0.7 Gb per minute. > > Regards. > -- > View this message in context: > http://old.nabble.com/Transfer-large-file-%3E50Gb-with-DistCp-from-s3-to-cluster-tp34389118p34389118.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > _____________________________ Mischa Tuffield PhD http://mmt.me.uk/ http://mmt.me.uk/foaf.rdf#mischa
