Hi all, I built Nutch on Solr (versions 1.4 and 1.4.1) on Windows. I can parse and crawl a website, but when I try to indexing this data with Solr, I received an error.. this is my command:
bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5 and this is (the final part of) the reply: ... ParseSegment: finished at 2012-01-29 23:10:20, elapsed: 00:00:04 CrawlDb update: starting at 2012-01-29 23:10:20 CrawlDb update: db: crawl-20120129230752/crawldb CrawlDb update: segments: [crawl-20120129230752/segments/20120129230930] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: 404 purging: false CrawlDb update: Merging segment data into db. CrawlDb update: finished at 2012-01-29 23:10:25, elapsed: 00:00:04 LinkDb: starting at 2012-01-29 23:10:25 LinkDb: linkdb: crawl-20120129230752/linkdb LinkDb: URL normalize: true LinkDb: URL filter: true LinkDb: adding segment: file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120129230752/segments/20120129230806 LinkDb: adding segment: file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120129230752/segments/20120129230834 LinkDb: adding segment: file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120129230752/segments/20120129230930 LinkDb: finished at 2012-01-29 23:10:30, elapsed: 00:00:05 SolrIndexer: starting at 2012-01-29 23:10:30 Adding 11 documents java.io.IOException: Job failed! SolrDeleteDuplicates: starting at 2012-01-29 23:10:44 SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ Exception in thread "main" java.io.IOException: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353) at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198) ... 9 more Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 1) or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) ... 11 more *CAN YOU HELP ME!?!?* best, alessio