Try this command.

 bin/nutch crawl urls/<folder name>/<url file>.txt -dir crawl/<folders name>
-threads 10 -depth 2 -topN 1000

Your folder structure will look like this:

<nutch folder>-- urls -- <folder name>-- <url file>.txt
                    |
                    |
                     -- crawl -- <folder name>

The folder name will be for different domains. So for each domain folder in
urls folder there has to be a corresponding folder (with the same name) in
the crawl folder.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3765607.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to