Pig tries to do this with some of their optimizations. You ultimately have to
combine them together into a single map/reduce job, with two separate execution
paths. It is complicated, especially in the shuffle phase. It would probably
look something like
MapCollectorWrapper implements collector {
Collector wrapped;
LongWriteable taskkey;
MapCollectorWrapper(Collector c, taskkey) {
wrapped = c;
this.taskkey = taskkey;
}
emit(key, value) {
wrapped.emit(SpecialCompoundKey(key, taskkey), value));
}
}
Map (key, value, collector) {
//TODO clone key and value to avoid mapper from changing them.
MapCollectorWrapper m1 = new MapCollectorWrappen(collector, 1);
Map1(key, value, m1);
MapCollectorWrapper m2 = new MapCollectorWrappen(collector, 2);
Map2(key, value, m2);
}
Reduce ( SpecialCompoundKey key, Iterable values, collector) {
//TODO need to have a multifile output format and wrap the collector here
to so that the output files all go to the proper place.
if(key.getTaskKey() == 1) {
Reduce1(key.getRealKey(), values, collector );
} else {
Reduce2(key.getRealKey(), values, collector );
}
}
On 2/25/12 7:34 AM, "Bruce Wang" <[email protected]> wrote:
Hi,
There are tow map-reduce jobs,which have same input file.
They must read the input file double times.
I want that the jobs read the file one time,and they can share the same in
memory
How can I to do?
Thanks
________________________________
Best Regards
Bruce Wang