Re: Aggregate Word Count from the Mapreduce examples

Pratyush Das Mon, 02 May 2022 11:43:02 -0700

Thanks!

Am I correct in understanding then that Aggregate WordCount and WordCount
do the same thing, apart from the fact that the Aggregate WordCount example
uses the Aggregate framework of Hadoop?  - as mentioned here in
https://stackoverflow.com/questions/24105117/how-to-execute-aggreagatewordcount-example-in-hadoop-which-uses-hadoop-aggregate#comment37203837_24105117



On Mon, 2 May 2022 at 13:16, Ayush Saxena <[email protected]> wrote:

> Hi,
> I tried it too and it gave me a similar output. Looks like some bug with
> the code. The code seems to be there since stone age though...
> I tried a fix, it seems there was "." period missing while setting the
> conf and when retrieving we were trying to get it with the period.
> Have put the code here:
>
> https://github.com/ayushtkn/hadoop/commit/ab7da425e204903e867855b05b7c8fc2fbdd8b0e
>
> Patched it on top of trunk and gave it a try locally for your use case,
> seems post that output is correct. Will check and raise a MAPRED Jira to
> fix, If it gets reviewed & Committed you can either patch your hadoop
> distro or wait for the next release which would contain a fix.
>
> hadoop-3.4.0-SNAPSHOT % bin/hadoop jar
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar  
> aggregatewordcount
> /testData /testOut 1 textinputformat
>
>
> hadoop-3.4.0-SNAPSHOT % bin/hdfs dfs -cat /testOut/part-r-00000
>
>
>
> Bye 1
>
> Goodbye 1
>
> Hadoop 2
>
> Hello 2
>
> World 2
>
>
>
> > Does this mean that Aggregate WordCount is merely counting the number of
> files in the input directory?
>
> Not in an ideal situation, The JavaDoc says: *It reads the text input
> files, breaks each line into words and counts them. The output is a locally
> sorted list of words and the count of how often they occurred.*
>
> On Mon, 2 May 2022 at 10:23, Pratyush Das <[email protected]> wrote:
>
>> Hi,
>>
>> I had some questions about what the Aggregate Word Count example in the
>> hadoop-mapreduce-examples-3.3.1.jar actually does.
>>
>> This is how I executed the AggregateWordCount example - hadoop jar
>> hadoop-3.3.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar
>> aggregatewordcount /examples-input/wordcount/ /examples-output/wordcount/ 1
>> textinputformat
>>
>> /examples-input/wordcount/ contains 2 files - wc01.txt and wc02.txt.
>>
>> These are the contents of wc01.txt:
>> Hello World Bye World
>>
>> These are the contents of wc02.txt:
>> Hello Hadoop Goodbye Hadoop
>>
>> The generated output file - /examples-output/wordcount/part-r-00000
>> contains the following line:
>> record_count 2
>>
>> I tried adding another file - wc03.txt which changed the content of the
>> generated file to:
>> record_count 3
>>
>> Does this mean that Aggregate WordCount is merely counting the number of
>> files in the input directory?
>>
>> Regards,
>>
>>
>> --
>> Pratyush Das
>>
>

-- 
Pratyush Das

Re: Aggregate Word Count from the Mapreduce examples

Reply via email to