[
https://issues.apache.org/jira/browse/HADOOP-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527868#comment-13527868
]
Yu Li commented on HADOOP-7386:
-------------------------------
I have tested concatenated bzip2 files with hadoop-1.0.3 plus patch of
HADOOP-7823, and confirmed it could be read-out correctly in MR job. Below are
the detailed steps of my testing:
1) create file test1, with content:
=================================
Hello World
World test
=================================
2) create file test2, with content:
=================================
Hello Jay
Jay test
=================================
3) compress them using command "bzip2 -z test1 test2", and this would create
test1.bz2 and test2.bz2
4) create the concatenated bzip2 file with command "cat test1.bz2 test2.bz2 >
test-contatenate.bz2"
5) create dir and put the concatenated bzip2 file in HDFS: "hadoop fs -mkdir
/tmp/bzip2/input && hadoop fs -put test-contatenate.bz2 /tmp/bzip2/input"
6) run wordcount example program to test: "hadoop jar
$HADOOP_HOME/hadoop-examples*.jar wordcount /tmp/bzip2/input /tmp/bzip2/output"
7) check the result, it's correct with content:
=================================
Hello 2
Jay 2
World 2
test 2
=================================
> Support concatenated bzip2 files
> --------------------------------
>
> Key: HADOOP-7386
> URL: https://issues.apache.org/jira/browse/HADOOP-7386
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Allen Wittenauer
> Assignee: Karthik Kambatla
>
> HADOOP-6835 added the framework and direct support for concatenated gzip
> files. We should do the same for bzip files.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira