How can I achieve secondary retrieval in mapreduce?

谭军 Mon, 08 Aug 2011 07:59:19 -0700

Hi,
I want to write a program to achieve secondary retrieval, but don't know how to 
do it.
I don't know how to express myself, so the source code below my help.
I don't know whether my first retieval algorithm is right, but it worked.
Database file is the inputfile.
I think it is splited into different mappers.
I thought that using a LinkedList to store the new keys generated by first 
retrieval could help.
But I don't know how to retrieve the database file from the beginning again. 
The database file for the first and second retrieval is the same.( args[1] : 
database path )
Reducer is not used.
 
 
public class Retrieval {
 
 public static void main(String[] args) throws IOException, URISyntaxException {
  if (args.length != 3) {
   System.err
     .println("Usage: Retrieval <protein set path> <database path> <output 
path>");
   System.exit(-1);
  }
   
  JobConf conf = new JobConf(new Configuration(), Retrieval.class);
  conf.setJobName("Retrieval");
  DistributedCache.addCacheFile(new URI(args[0]), conf);    
  FileInputFormat.addInputPath(conf, new Path(args[1]));
  FileOutputFormat.setOutputPath(conf, new Path(args[2]));
  conf.setMapperClass(RetrievalMapper.class);
  //conf.setReducerClass(RetrievalReducer.class);
  conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(Text.class);
  JobClient.runJob(conf);
 }


public class RetrievalMapper extends MapReduceBase implements
  Mapper<LongWritable, Text, Text, Text> {
 private Path[] localFiles;
 public void configure(JobConf conf) {
  try {
   this.localFiles = DistributedCache.getLocalCacheFiles(conf);
  } catch (IOException e) {
   e.printStackTrace();
  }
 }
 public void map(LongWritable key, Text value,
   OutputCollector<Text, Text> output, Reporter reporter)
   throws IOException {
 
  String line = value.toString();
  
  LinkedList<String> list = new LinkedList<String>(); //store the first 
neighbors
  
  BufferedReader proReader = new BufferedReader(new 
FileReader(this.localFiles[0].toString()));
  String proID = new String("");
  String[] proteinIDs = line.split("\t");
  String tmpString = proteinIDs[0] + "\t" + proteinIDs[1];
  
  while ((proID = proReader.readLine()) != null) { // for each line (protein 
ID) in key file
   if(proID.equalsIgnoreCase(proteinIDs[0])){ // hit and proteinIDs[1] is its 
first neighbor
    output.collect(new Text(tmpString), new Text(proteinIDs[2]));
    list.add(proteinIDs[1]);  // add first neighbor to list
   }
   if(proID.equalsIgnoreCase(proteinIDs[1])){ // hit and proteinIDs[0] is its 
first neighbor
    output.collect(new Text(tmpString), new Text(proteinIDs[2]));
    list.add(proteinIDs[0]);  // add first neighbor to list
   }
  }
  proReader.close();
 }
}


--


Regards!

Jun Tan

How can I achieve secondary retrieval in mapreduce?

Reply via email to