Thanks! Am I correct in understanding then that Aggregate WordCount and WordCount do the same thing, apart from the fact that the Aggregate WordCount example uses the Aggregate framework of Hadoop? - as mentioned here in https://stackoverflow.com/questions/24105117/how-to-execute-aggreagatewordcount-example-in-hadoop-which-uses-hadoop-aggregate#comment37203837_24105117
On Mon, 2 May 2022 at 13:16, Ayush Saxena <[email protected]> wrote: > Hi, > I tried it too and it gave me a similar output. Looks like some bug with > the code. The code seems to be there since stone age though... > I tried a fix, it seems there was "." period missing while setting the > conf and when retrieving we were trying to get it with the period. > Have put the code here: > > https://github.com/ayushtkn/hadoop/commit/ab7da425e204903e867855b05b7c8fc2fbdd8b0e > > Patched it on top of trunk and gave it a try locally for your use case, > seems post that output is correct. Will check and raise a MAPRED Jira to > fix, If it gets reviewed & Committed you can either patch your hadoop > distro or wait for the next release which would contain a fix. > > hadoop-3.4.0-SNAPSHOT % bin/hadoop jar > share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar > aggregatewordcount > /testData /testOut 1 textinputformat > > > hadoop-3.4.0-SNAPSHOT % bin/hdfs dfs -cat /testOut/part-r-00000 > > > > Bye 1 > > Goodbye 1 > > Hadoop 2 > > Hello 2 > > World 2 > > > > > Does this mean that Aggregate WordCount is merely counting the number of > files in the input directory? > > Not in an ideal situation, The JavaDoc says: *It reads the text input > files, breaks each line into words and counts them. The output is a locally > sorted list of words and the count of how often they occurred.* > > On Mon, 2 May 2022 at 10:23, Pratyush Das <[email protected]> wrote: > >> Hi, >> >> I had some questions about what the Aggregate Word Count example in the >> hadoop-mapreduce-examples-3.3.1.jar actually does. >> >> This is how I executed the AggregateWordCount example - hadoop jar >> hadoop-3.3.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar >> aggregatewordcount /examples-input/wordcount/ /examples-output/wordcount/ 1 >> textinputformat >> >> /examples-input/wordcount/ contains 2 files - wc01.txt and wc02.txt. >> >> These are the contents of wc01.txt: >> Hello World Bye World >> >> These are the contents of wc02.txt: >> Hello Hadoop Goodbye Hadoop >> >> The generated output file - /examples-output/wordcount/part-r-00000 >> contains the following line: >> record_count 2 >> >> I tried adding another file - wc03.txt which changed the content of the >> generated file to: >> record_count 3 >> >> Does this mean that Aggregate WordCount is merely counting the number of >> files in the input directory? >> >> Regards, >> >> >> -- >> Pratyush Das >> > -- Pratyush Das
