Hi, I've read that the combiner only works if it is specified AND the sort memory buffer overflows in the mapper. http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201107.mbox/%[email protected]%3E
But when I run a Hadoop streaming job in R using RHadoop, the combiner always runs when specified. This is on a very small dataset. Is this the desired behaviour? Thanks, Sudip Sinha
