Re: How to use webhdfs CONCAT?

Wellington Chevreuil Thu, 27 Jul 2017 02:12:40 -0700

Yes, all the files passed must pre-exist. In this case, you would need to run 
something as follows:


curl -i -X POST 
"http://HOST/webhdfs/v1/PATH_TO_YOUR_HDFS_FOLDER/part-01-000000-000?user.name=hadoop&op=CONCAT&sources=PATH_TO_YOUR_HDFS_FOLDER/part-02-000000-000,PATH_TO_YOUR_HDFS_FOLDER/part-04-000000-000";

Where these 3 files would be concatenated into 
PATH_TO_YOUR_HDFS_FOLDER/part-01-000000-000 file. Note that this will only work 
if the file sizes are exact multiples of "dfs.block.size". If not, you may get 
another error.

> On 27 Jul 2017, at 10:06, Cinyoung Hur <[email protected]> wrote:
> 
> Hi, Wellington
> 
> All the source parts are:
> -rw-r--r--    hadoop  supergroup      2.43 KB 2       32 MB   
> part-01-000000-000
> -rw-r--r--    hadoop  supergroup      21.14 MB        2       32 MB   
> part-02-000000-000
> -rw-r--r--    hadoop  supergroup      22.1 MB 2       32 MB   
> part-04-000000-000
> -rw-r--r--    hadoop  supergroup      22.29 MB        2       32 MB   
> part-05-000000-000
> -rw-r--r--    hadoop  supergroup      22.29 MB        2       32 MB   
> part-06-000000-000
> -rw-r--r--    hadoop  supergroup      22.56 MB        2       32 MB   
> part-07-000000-000
> 
> 
> I got this exception. It seems like I have to create target file before 
> concatenation.
> 
> curl -i -X POST 
> "http://HOST/webhdfs/v1/tajo/warehouse/hira_analysis/material_usage_concat?user.name=hadoop&op=CONCAT&sources=/tajo/warehouse/hira_analysis/material_usage
>  
> <http://host/webhdfs/v1/tajo/warehouse/hira_analysis/material_usage_concat?user.name=hadoop&op=CONCAT&sources=/tajo/warehouse/hira_analysis/material_usage>"
> HTTP/1.1 404 Not Found
> Date: Thu, 27 Jul 2017 09:05:48 GMT
> Server: Jetty(6.1.26)
> Content-Type: application/json
> Cache-Control: no-cache
> Expires: Thu, 27 Jul 2017 09:05:48 GMT
> Pragma: no-cache
> Expires: Thu, 27 Jul 2017 09:05:48 GMT
> Pragma: no-cache
> Set-Cookie: 
> hadoop.auth="u=hadoop&p=hadoop&t=simple&e=1501182348739&s=o02nv4on4FXbhlijJ+R/KXvhooQ=";
>  Path=/; Expires=Thu, 27-Jul-2017 19:05:48 GMT; HttpOnly
> Transfer-Encoding: chunked
> 
> {"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
>  does not exist: /tajo/warehouse/hira_analysis/material_usage_concat"}}%      
>   
> 
> Thanks!
> 
> 2017-07-26 0:54 GMT+09:00 Wellington Chevreuil 
> <[email protected] <mailto:[email protected]>>:
> Hi Cinyoung, 
> 
> Concat has some restrictions, like the need for src file having last block 
> size to be the same as the configured dfs.block.size. If all the conditions 
> are met, below command example should work (where we are concatenating 
> /user/root/file-2 into /user/root/file-1):
> 
> curl -i -X POST "http:HTTPFS_HOST:14000/webhdfs/v1/user/root/file-1?user.name 
> <http://user.name/>=root&op=CONCAT&sources=/user/root/file-2"
> 
> Is this similar to what you had tried? Can you share the resulting output you 
> are getting?
> 
> 
> 
>> On 25 Jul 2017, at 09:00, Cinyoung Hur <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files
>>  
>> <https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Concat_Files>
>> 
>> I tried to concat multiple parts to single target file through webhdfs. 
>> But, I couldn't do it. 
>> Could you give me examples concatenating parts?
> 
>

Re: How to use webhdfs CONCAT?

Reply via email to