Hello Baruch! You are not pointing to a directory of segments, not a specific segment.
You must either point to a directory with the -dir option: bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb -dir crawl/segments/ Or point to a segment: bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/YOUR_SEGMENT Cheers -----Original message----- > From:Baruch Kogan <bar...@sellerpanda.com> > Sent: Sunday 1st March 2015 18:57 > To: solr-user@lucene.apache.org > Subject: Integrating Solr with Nutch > > Hi, guys, > > I'm working through the tutorial here > <http://wiki.apache.org/nutch/NutchTutorial#A6._Integrate_Solr_with_Nutch>. > I've run a crawl on a list of webpages. Now I'm trying to index them into > Solr. Solr's installed, runs fine, indexes .json, .xml, whatever, returns > queries. I've edited the Nutch schema as per instructions. Now I hit a wall: > > - > > Save the file and restart Solr under ${APACHE_SOLR_HOME}/example: > > java -jar start.jar\ > > > On my install (the latest Solr,) there is no such file, but there is a > solr.sh file in the /bin which I can start. So I pasted it into > solr/example/ and ran it from there. Solr cranks over. Now I need to: > > > - > > run the Solr Index command from ${NUTCH_RUNTIME_HOME}: > > bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb > -linkdb crawl/linkdb crawl/segments/ > > > and I get this: > > *ubuntu@ubuntu-VirtualBox:~/crawler/nutch$ bin/nutch solrindex > http://127.0.0.1:8983/solr/ <http://127.0.0.1:8983/solr/> crawl/crawldb > -linkdb crawl/linkdb crawl/segments/* > *Indexer: starting at 2015-03-01 19:51:09* > *Indexer: deleting gone documents: false* > *Indexer: URL filtering: false* > *Indexer: URL normalizing: false* > *Active IndexWriters :* > *SOLRIndexWriter* > * solr.server.url : URL of the SOLR instance (mandatory)* > * solr.commit.size : buffer size when sending to SOLR (default 1000)* > * solr.mapping.file : name of the mapping file for fields (default > solrindex-mapping.xml)* > * solr.auth : use authentication (default false)* > * solr.auth.username : use authentication (default false)* > * solr.auth : username for authentication* > * solr.auth.password : password for authentication* > > > *Indexer: org.apache.hadoop.mapred.InvalidInputException: Input path does > not exist: file:/home/ubuntu/crawler/nutch/crawl/segments/crawl_fetch* > *Input path does not exist: > file:/home/ubuntu/crawler/nutch/crawl/segments/crawl_parse* > *Input path does not exist: > file:/home/ubuntu/crawler/nutch/crawl/segments/parse_data* > *Input path does not exist: > file:/home/ubuntu/crawler/nutch/crawl/segments/parse_text* > *Input path does not exist: > file:/home/ubuntu/crawler/nutch/crawl/crawldb/current* > *Input path does not exist: > file:/home/ubuntu/crawler/nutch/crawl/linkdb/current* > * at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197)* > * at > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40)* > * at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)* > * at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081)* > * at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073)* > * at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)* > * at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)* > * at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)* > * at java.security.AccessController.doPrivileged(Native Method)* > * at javax.security.auth.Subject.doAs(Subject.java:415)* > * at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)* > * at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)* > * at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)* > * at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)* > * at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)* > * at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)* > * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* > * at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)* > > What am I doing wrong? > > Sincerely, > > Baruch Kogan > Marketing Manager > Seller Panda <http://sellerpanda.com> > +972(58)441-3829 > baruch.kogan at Skype >