I'm trying to write a Reducer which will eliminate duplicates from the list of
values before writing them out. I have the following code for my Reducer:
/*****************/
public class ClickStreamIndexerReducer extends Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text dirName, Iterable<Text> values, Reducer<Text, Text,
Text, Text>.Context context) throws IOException, InterruptedException {
Text value = new Text();
Text lastValue = new Text();
Iterator<Text> valuesIterator = values.iterator();
while(valuesIterator.hasNext()) {
value = valuesIterator.next();
while(value.equals(lastValue)){
context.write(key, value);
lastValue = value;
}
}
}
}
/*****************/
Right before the first time "value = valuesIterator.next()" is called, both
value and lastValue are empty as expected. Then value is set to the first value
and lastValue is still empty. After I write out value I set lastValue to value.
The first time through the outer while loop everything goes as expected.
However the next time through, when "value = valuesIterator.next()" is called,
both value and lastValue are set to the exact same object. Every time through
the loop after that, when value is set, lastValue gets set to the same thing.