Re: Solr 5.0.0 integration with Nutch 1.9

Anchit Jain Mon, 06 Apr 2015 21:16:22 -0700

I followed the given steps and created a core named foo with
sample_techproducts_configs
but when I give the indexing command to nutch
"bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb
crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize"
it gives an error that


Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)


Here is the complete hadoop log for the process.I have underlined the error
part in it.

2015-04-07 09:38:06,613 INFO  indexer.IndexingJob - Indexer: starting at
2015-04-07 09:38:06
2015-04-07 09:38:06,684 INFO  indexer.IndexingJob - Indexer: deleting gone
documents: false
2015-04-07 09:38:06,685 INFO  indexer.IndexingJob - Indexer: URL filtering:
true
2015-04-07 09:38:06,685 INFO  indexer.IndexingJob - Indexer: URL
normalizing: true
2015-04-07 09:38:06,893 INFO  indexer.IndexWriters - Adding
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2015-04-07 09:38:06,893 INFO  indexer.IndexingJob - Active IndexWriters :
SOLRIndexWriter
solr.server.url : URL of the SOLR instance (mandatory)
solr.commit.size : buffer size when sending to SOLR (default 1000)
solr.mapping.file : name of the mapping file for fields (default
solrindex-mapping.xml)
solr.auth : use authentication (default false)
solr.auth.username : use authentication (default false)
solr.auth : username for authentication
solr.auth.password : password for authentication


2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
crawldb: crawl/crawldb
2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
linkdb: crawl/linkdb
2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce - IndexerMapReduces:
adding segment: crawl/segments/20150406231502
2015-04-07 09:38:07,036 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2015-04-07 09:38:07,540 INFO  anchor.AnchorIndexingFilter - Anchor
deduplication is: off
2015-04-07 09:38:07,565 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:09,552 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:10,642 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:10,734 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:10,895 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:11,088 INFO  regex.RegexURLNormalizer - can't find rules
for scope 'indexer', using default
2015-04-07 09:38:11,219 INFO  indexer.IndexWriters - Adding
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: content
dest: content
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: title dest:
title
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: host dest:
host
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: segment
dest: segment
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: boost dest:
boost
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: digest dest:
digest
2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: tstamp dest:
tstamp
2015-04-07 09:38:11,526 INFO  solr.SolrIndexWriter - Indexing 250 documents
2015-04-07 09:38:11,526 INFO  solr.SolrIndexWriter - Deleting 0 documents
2015-04-07 09:38:11,644 INFO  solr.SolrIndexWriter - Indexing 250 documents
*2015-04-07 09:38:11,699 WARN  mapred.LocalJobRunner -
job_local1245074757_0001*
*org.apache.solr.common.SolrException: Not Found*

*Not Found*

*request: http://localhost:8983/solr/update?wt=javabin&version=2
<http://localhost:8983/solr/update?wt=javabin&version=2>*
* at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)*
* at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)*
* at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)*
* at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:135)*
* at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88)*
* at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)*
* at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)*
* at
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:458)*
* at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500)*
* at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:323)*
* at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)*
* at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)*
* at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)*
* at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)*
*2015-04-07 09:38:12,408 ERROR indexer.IndexingJob - Indexer:
java.io.IOException: Job failed!*
* at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)*
* at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)*
* at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)*
* at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)*
* at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)*



On Tue, 7 Apr 2015 at 04:54 Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/6/2015 2:14 PM, Anchit Jain wrote:
> > I want to index nutch results using *Solr 5.0* but as mentioned in
> > https://wiki.apache.org/nutch/NutchTutorial there is no directory
> > ${APACHE_SOLR_HOME}/example/solr/collection1/conf/
> >  in  solr 5.0 . So where I have to copy *schema.xml*?
> > Also there is no *start.jar* present in example directory.
>
> The first thing to ask is whether you are running in cloud mode or
> standard mode.  If you're in cloud mode, then what I'm saying below will
> require modification.
>
> After you start Solr with "bin/solr start" you can then do this command:
>
> bin/solr create -c foo -d sample_techproducts_configs
>
> Once that's done, you will have a core named foo, and then you can put
> the schema and any other Solr config files you get from nutch in the
> server/solr/foo/conf directory.
>
> The create command will choose the example for a data-driven schema by
> default.  The sample_techproducts_configs example will meet your needs
> better.
>
> Thanks,
> Shawn
>
>

Re: Solr 5.0.0 integration with Nutch 1.9

Reply via email to