Hi,
Here I'm developing a MapReduce web crawler which reads url lists and writes
html to MongoDB.
So, each map read one url list file, get the html and insert to MongoDB. There
is no reduce and no output of map. So, how to set the output directory in this
case? If I do not set the output directory, it gives me following exception,
Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException:
Output directory not set.
at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at
com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Thank you !
Best,
Huanchen
2012-06-08
huanchen.zhang