Well... It all depends on where is your bottleneck. Do a benchmark for your use case if it is critical. Multi-threading might be useful not always. And you would rather want to avoid having a locally shared mutable state because it can become a pain to manage. But it doesn't mean you can't do multi-threading...
You only need to browse the type hierarchy a bit to find about http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html Regards Bertrand On Mon, Jan 14, 2013 at 8:22 AM, Mark Olimpiati <[email protected]> wrote: > Thanks for the reply Nitin, but I don't see what's the bottleneck of having > it distributed with multi-threaded maps ? > > I see your point in that each map is processing different splits, but my > question is if each map task had 2 threads multiplexing or running in > parallel if there is enough cores to process the same split, wouldn't that > be faster with enough cores? > > Mark > > > On Sun, Jan 13, 2013 at 10:34 PM, Nitin Pawar <[email protected] > >wrote: > > > Thats because its distributed processing framework over network > > On Jan 14, 2013 11:27 AM, "Mark Olimpiati" <[email protected]> wrote: > > > > > Hi, this is a simple question, but why wasn't map or reduce tasks > > > programmed to be multi-threaded ? ie. instead of spawning 6 map tasks > > for 6 > > > cores, run one map task with 6 parallel threads. > > > > > > In fact I tried this myself, but turns that threading is not helping as > > it > > > would be in regular java programs for some reason .. any feedback on > this > > > topic? > > > > > > Thanks, > > > Mark > > > > > > -- Bertrand Dechoux
