Solr 5.0.0 integration with Nutch 1.9

2015-04-06 Thread Anchit Jain
I want to index nutch results using *Solr 5.0* but as mentioned in
https://wiki.apache.org/nutch/NutchTutorial there is no directory
${APACHE_SOLR_HOME}/example/solr/collection1/conf/
 in  solr 5.0 . So where I have to copy *schema.xml*?
Also there is no *start.jar* present in example directory.


Re: Solr 5.0.0 integration with Nutch 1.9

2015-04-06 Thread Anchit Jain
sk$3.collect(ReduceTask.java:500)*
* at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:323)*
* at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)*
* at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)*
* at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)*
* at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)*
*2015-04-07 09:38:12,408 ERROR indexer.IndexingJob - Indexer:
java.io.IOException: Job failed!*
* at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)*
* at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)*
* at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)*
* at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)*
* at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)*



On Tue, 7 Apr 2015 at 04:54 Shawn Heisey  wrote:

> On 4/6/2015 2:14 PM, Anchit Jain wrote:
> > I want to index nutch results using *Solr 5.0* but as mentioned in
> > https://wiki.apache.org/nutch/NutchTutorial there is no directory
> > ${APACHE_SOLR_HOME}/example/solr/collection1/conf/
> >  in  solr 5.0 . So where I have to copy *schema.xml*?
> > Also there is no *start.jar* present in example directory.
>
> The first thing to ask is whether you are running in cloud mode or
> standard mode.  If you're in cloud mode, then what I'm saying below will
> require modification.
>
> After you start Solr with "bin/solr start" you can then do this command:
>
> bin/solr create -c foo -d sample_techproducts_configs
>
> Once that's done, you will have a core named foo, and then you can put
> the schema and any other Solr config files you get from nutch in the
> server/solr/foo/conf directory.
>
> The create command will choose the example for a data-driven schema by
> default.  The sample_techproducts_configs example will meet your needs
> better.
>
> Thanks,
> Shawn
>
>


Ignoring metatags in solr

2015-04-08 Thread Anchit Jain

I have crawled a website using nutch.
When I try to index it with solr I get following error
org.apache.solr.common.SolrException: ERROR: [doc=http://xyz.htm] 
unknown field 'metatag.keywords'

*unknown field 'metatag.keywords'*

I can not figure out where the error is as I have o not defined any 
field in schema.xml for metatags.I just copied the schema.xml from nutch 
into solr.

I am using Nutch 1.9 with Solr 4.10

My *schema.xml* for *solr*




sortMissingLast="true"

omitNorms="true"/>

precisionStep="0"

omitNorms="true" positionIncrementGap="0"/>






















































multiValued="true"/>











id
content



my *solrindex-mapping.xml*











id