I followed the given steps and created a core named foo with sample_techproducts_configs but when I give the indexing command to nutch "bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize" it gives an error that
Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) Here is the complete hadoop log for the process.I have underlined the error part in it. 2015-04-07 09:38:06,613 INFO indexer.IndexingJob - Indexer: starting at 2015-04-07 09:38:06 2015-04-07 09:38:06,684 INFO indexer.IndexingJob - Indexer: deleting gone documents: false 2015-04-07 09:38:06,685 INFO indexer.IndexingJob - Indexer: URL filtering: true 2015-04-07 09:38:06,685 INFO indexer.IndexingJob - Indexer: URL normalizing: true 2015-04-07 09:38:06,893 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-04-07 09:38:06,893 INFO indexer.IndexingJob - Active IndexWriters : SOLRIndexWriter solr.server.url : URL of the SOLR instance (mandatory) solr.commit.size : buffer size when sending to SOLR (default 1000) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : use authentication (default false) solr.auth : username for authentication solr.auth.password : password for authentication 2015-04-07 09:38:06,898 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb 2015-04-07 09:38:06,898 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb 2015-04-07 09:38:06,898 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20150406231502 2015-04-07 09:38:07,036 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-04-07 09:38:07,540 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2015-04-07 09:38:07,565 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:09,552 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:10,642 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:10,734 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:10,895 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:11,088 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 09:38:11,219 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: content dest: content 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: title dest: title 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: host dest: host 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: segment dest: segment 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: boost dest: boost 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: digest dest: digest 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2015-04-07 09:38:11,526 INFO solr.SolrIndexWriter - Indexing 250 documents 2015-04-07 09:38:11,526 INFO solr.SolrIndexWriter - Deleting 0 documents 2015-04-07 09:38:11,644 INFO solr.SolrIndexWriter - Indexing 250 documents *2015-04-07 09:38:11,699 WARN mapred.LocalJobRunner - job_local1245074757_0001* *org.apache.solr.common.SolrException: Not Found* *Not Found* *request: http://localhost:8983/solr/update?wt=javabin&version=2 <http://localhost:8983/solr/update?wt=javabin&version=2>* * at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)* * at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)* * at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)* * at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:135)* * at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88)* * at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)* * at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)* * at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:458)* * at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500)* * at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:323)* * at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)* * at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)* * at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)* * at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)* *2015-04-07 09:38:12,408 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!* * at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)* * at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)* * at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)* * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* * at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)* On Tue, 7 Apr 2015 at 04:54 Shawn Heisey <apa...@elyograg.org> wrote: > On 4/6/2015 2:14 PM, Anchit Jain wrote: > > I want to index nutch results using *Solr 5.0* but as mentioned in > > https://wiki.apache.org/nutch/NutchTutorial there is no directory > > ${APACHE_SOLR_HOME}/example/solr/collection1/conf/ > > in solr 5.0 . So where I have to copy *schema.xml*? > > Also there is no *start.jar* present in example directory. > > The first thing to ask is whether you are running in cloud mode or > standard mode. If you're in cloud mode, then what I'm saying below will > require modification. > > After you start Solr with "bin/solr start" you can then do this command: > > bin/solr create -c foo -d sample_techproducts_configs > > Once that's done, you will have a core named foo, and then you can put > the schema and any other Solr config files you get from nutch in the > server/solr/foo/conf directory. > > The create command will choose the example for a data-driven schema by > default. The sample_techproducts_configs example will meet your needs > better. > > Thanks, > Shawn > >