Hi,
I want to write a program to achieve secondary retrieval, but don't know how to
do it.
I don't know how to express myself, so the source code below my help.
I don't know whether my first retieval algorithm is right, but it worked.
Database file is the inputfile.
I think it is splited into different mappers.
I thought that using a LinkedList to store the new keys generated by first
retrieval could help.
But I don't know how to retrieve the database file from the beginning again.
The database file for the first and second retrieval is the same.( args[1] :
database path )
Reducer is not used.
public class Retrieval {
public static void main(String[] args) throws IOException, URISyntaxException {
if (args.length != 3) {
System.err
.println("Usage: Retrieval <protein set path> <database path> <output
path>");
System.exit(-1);
}
JobConf conf = new JobConf(new Configuration(), Retrieval.class);
conf.setJobName("Retrieval");
DistributedCache.addCacheFile(new URI(args[0]), conf);
FileInputFormat.addInputPath(conf, new Path(args[1]));
FileOutputFormat.setOutputPath(conf, new Path(args[2]));
conf.setMapperClass(RetrievalMapper.class);
//conf.setReducerClass(RetrievalReducer.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
JobClient.runJob(conf);
}
public class RetrievalMapper extends MapReduceBase implements
Mapper<LongWritable, Text, Text, Text> {
private Path[] localFiles;
public void configure(JobConf conf) {
try {
this.localFiles = DistributedCache.getLocalCacheFiles(conf);
} catch (IOException e) {
e.printStackTrace();
}
}
public void map(LongWritable key, Text value,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException {
String line = value.toString();
LinkedList<String> list = new LinkedList<String>(); //store the first
neighbors
BufferedReader proReader = new BufferedReader(new
FileReader(this.localFiles[0].toString()));
String proID = new String("");
String[] proteinIDs = line.split("\t");
String tmpString = proteinIDs[0] + "\t" + proteinIDs[1];
while ((proID = proReader.readLine()) != null) { // for each line (protein
ID) in key file
if(proID.equalsIgnoreCase(proteinIDs[0])){ // hit and proteinIDs[1] is its
first neighbor
output.collect(new Text(tmpString), new Text(proteinIDs[2]));
list.add(proteinIDs[1]); // add first neighbor to list
}
if(proID.equalsIgnoreCase(proteinIDs[1])){ // hit and proteinIDs[0] is its
first neighbor
output.collect(new Text(tmpString), new Text(proteinIDs[2]));
list.add(proteinIDs[0]); // add first neighbor to list
}
}
proReader.close();
}
}
--
Regards!
Jun Tan