solj is the solr java client library, so there seem to be two versions 1.4.1 and 3.4.0, which are incompatible, so you can do the following,
refer : https://github.com/geek4377/nutch/commit/c66bf35ff4f86393413621b3b889b1c78281df4d to see how to upgrade the solr version in nutch, teh above example replaces solr 1.4.0 with 3.1.0. On Sun, Feb 5, 2012 at 11:02 PM, alessio crisantemi <alessio.crisant...@gmail.com> wrote: > if I look the solr and nuth libs I found: > apache-solr-solrj-1.4.1.jar on Solr > and > solr-solrj-3.4.0.jar > > this are the only jar files with a word 'solrj'.... > taht's the problem?! > > 2012/2/5 Geek Gamer <geek4...@gmail.com> > >> looks like solrj version in nutch classpath is different that the solr >> version on server, >> can you post the versions for both nutch and solr? >> >> >> On Sun, Feb 5, 2012 at 10:24 PM, alessio crisantemi >> <alessio.crisant...@gmail.com> wrote: >> > no, all run on port 8983. >> > .. >> > >> > 2012/2/5 Matthew Parker <mpar...@apogeeintegration.com> >> > >> >> Doesn't tomcat run on port 8080, and not port 8983? Or did you change >> the >> >> tomcat's default port to 8983? >> >> On Feb 5, 2012 5:17 AM, "alessio crisantemi" < >> alessio.crisant...@gmail.com >> >> > >> >> wrote: >> >> >> >> > Hi All, >> >> > I have some problems with integration of Nutch in Solr and Tomcat. >> >> > >> >> > I follo Nutch tutorial for integration and now, I can crawl a website: >> >> all >> >> > works right. >> >> > But It I try the solr integration, I can't indexing on Solr. >> >> > >> >> > follow the nutch output after the command: >> >> > bin/nutch crawl urls -solr http://127.0.0.1:8983/solr/ -depth 3 >> -topN 5 >> >> > >> >> > I read "java.lang.RuntimeException: Invalid version (expected 2, but >> 1) >> >> or >> >> > the data in not in 'javabin' format" >> >> > MAY BE THERE IS A PROBLEM BETWEEN NUTCH 1.4 VERSION AND SOLR 1.4.1? >> MAY >> >> BE >> >> > IT REQUIRE A 3.X SOLR VERSION? >> >> > >> >> > thanks, >> >> > a. >> >> > >> >> > crawl started in: crawl-20120203151719 >> >> > rootUrlDir = urls >> >> > threads = 10 >> >> > depth = 3 >> >> > solrUrl=http://127.0.0.1:8983/solr/ >> >> > topN = 5 >> >> > Injector: starting at 2012-02-03 15:17:20 >> >> > Injector: crawlDb: crawl-20120203151719/crawldb >> >> > Injector: urlDir: urls >> >> > Injector: Converting injected urls to crawl db entries. >> >> > Injector: Merging injected urls into crawl db. >> >> > Injector: finished at 2012-02-03 15:17:31, elapsed: 00:00:10 >> >> > Generator: starting at 2012-02-03 15:17:31 >> >> > Generator: Selecting best-scoring urls due for fetch. >> >> > Generator: filtering: true >> >> > Generator: normalizing: true >> >> > Generator: topN: 5 >> >> > Generator: jobtracker is 'local', generating exactly one partition. >> >> > Generator: Partitioning selected urls for politeness. >> >> > Generator: segment: crawl-20120203151719/segments/20120203151735 >> >> > Generator: finished at 2012-02-03 15:17:39, elapsed: 00:00:07 >> >> > Fetcher: Your 'http.agent.name' value should be listed first in >> >> > 'http.robots.agents' property. >> >> > Fetcher: starting at 2012-02-03 15:17:39 >> >> > Fetcher: segment: crawl-20120203151719/segments/20120203151735 >> >> > Using queue mode : byHost >> >> > Fetcher: threads: 10 >> >> > Fetcher: time-out divisor: 2 >> >> > QueueFeeder finished: total 1 records + hit by time limit :0 >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > fetching http://www.gioconews.it/ >> >> > Using queue mode : byHost >> >> > -finishing thread FetcherThread, activeThreads=3 >> >> > -finishing thread FetcherThread, activeThreads=2 >> >> > -finishing thread FetcherThread, activeThreads=1 >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > -finishing thread FetcherThread, activeThreads=1 >> >> > -finishing thread FetcherThread, activeThreads=1 >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > -finishing thread FetcherThread, activeThreads=1 >> >> > Fetcher: throughput threshold: -1 >> >> > -finishing thread FetcherThread, activeThreads=1 >> >> > Fetcher: throughput threshold retries: 5 >> >> > -finishing thread FetcherThread, activeThreads=1 >> >> > -finishing thread FetcherThread, activeThreads=1 >> >> > fetch of http://www.gioconews.it/ failed with: >> >> > java.net.UnknownHostException: www.gioconews.it >> >> > -finishing thread FetcherThread, activeThreads=0 >> >> > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=0 >> >> > Fetcher: finished at 2012-02-03 15:17:44, elapsed: 00:00:05 >> >> > ParseSegment: starting at 2012-02-03 15:17:44 >> >> > ParseSegment: segment: crawl-20120203151719/segments/20120203151735 >> >> > ParseSegment: finished at 2012-02-03 15:17:48, elapsed: 00:00:04 >> >> > CrawlDb update: starting at 2012-02-03 15:17:48 >> >> > CrawlDb update: db: crawl-20120203151719/crawldb >> >> > CrawlDb update: segments: >> [crawl-20120203151719/segments/20120203151735] >> >> > CrawlDb update: additions allowed: true >> >> > CrawlDb update: URL normalizing: true >> >> > CrawlDb update: URL filtering: true >> >> > CrawlDb update: 404 purging: false >> >> > CrawlDb update: Merging segment data into db. >> >> > CrawlDb update: finished at 2012-02-03 15:17:53, elapsed: 00:00:05 >> >> > Generator: starting at 2012-02-03 15:17:53 >> >> > Generator: Selecting best-scoring urls due for fetch. >> >> > Generator: filtering: true >> >> > Generator: normalizing: true >> >> > Generator: topN: 5 >> >> > Generator: jobtracker is 'local', generating exactly one partition. >> >> > Generator: 0 records selected for fetching, exiting ... >> >> > Stopping at depth=1 - no more URLs to fetch. >> >> > LinkDb: starting at 2012-02-03 15:17:57 >> >> > LinkDb: linkdb: crawl-20120203151719/linkdb >> >> > LinkDb: URL normalize: true >> >> > LinkDb: URL filter: true >> >> > LinkDb: adding segment: >> >> > >> >> > >> >> >> file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120203151719/segments/20120203151735 >> >> > LinkDb: finished at 2012-02-03 15:18:01, elapsed: 00:00:04 >> >> > SolrIndexer: starting at 2012-02-03 15:18:01 >> >> > java.lang.RuntimeException: Invalid version (expected 2, but 1) or the >> >> data >> >> > in not in 'javabin' format >> >> > SolrDeleteDuplicates: starting at 2012-02-03 15:18:09 >> >> > SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr/ >> >> > Exception in thread "main" java.io.IOException: >> >> > org.apache.solr.client.solrj.SolrServerException: Error executing >> query >> >> > at >> >> > >> >> > >> >> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200) >> >> > at >> >> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) >> >> > at >> >> > >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) >> >> > at >> >> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) >> >> > at >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) >> >> > at >> >> > >> >> > >> >> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) >> >> > at >> >> > >> >> > >> >> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353) >> >> > at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) >> >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> >> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) >> >> > Caused by: org.apache.solr.client.solrj.SolrServerException: Error >> >> > executing query >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) >> >> > at >> >> > org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) >> >> > at >> >> > >> >> > >> >> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198) >> >> > ... 9 more >> >> > Caused by: java.lang.RuntimeException: Invalid version (expected 2, >> but >> >> 1) >> >> > or the data in not in 'javabin' format >> >> > at >> >> > >> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472) >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) >> >> > ... 11 more >> >> > Alessio@PC-Alessio/cygdrive/c/temp/apache-nutch-1.4-bin/runtime/local >> >> > $ bin/nutch crawl urls -solr http://127.0.0.1:8983/solr/ -depth 3 >> -topN >> >> 5 >> >> > crawl started in: crawl-20120203162510 >> >> > rootUrlDir = urls >> >> > threads = 10 >> >> > depth = 3 >> >> > solrUrl=http://127.0.0.1:8983/solr/ >> >> > topN = 5 >> >> > Injector: starting at 2012-02-03 16:25:11 >> >> > Injector: crawlDb: crawl-20120203162510/crawldb >> >> > Injector: urlDir: urls >> >> > Injector: Converting injected urls to crawl db entries. >> >> > Injector: Merging injected urls into crawl db. >> >> > Injector: finished at 2012-02-03 16:25:20, elapsed: 00:00:09 >> >> > Generator: starting at 2012-02-03 16:25:20 >> >> > Generator: Selecting best-scoring urls due for fetch. >> >> > Generator: filtering: true >> >> > Generator: normalizing: true >> >> > Generator: topN: 5 >> >> > Generator: jobtracker is 'local', generating exactly one partition. >> >> > Generator: Partitioning selected urls for politeness. >> >> > Generator: segment: crawl-20120203162510/segments/20120203162525 >> >> > Generator: finished at 2012-02-03 16:25:28, elapsed: 00:00:08 >> >> > Fetcher: Your 'http.agent.name' value should be listed first in >> >> > 'http.robots.agents' property. >> >> > Fetcher: starting at 2012-02-03 16:25:28 >> >> > Fetcher: segment: crawl-20120203162510/segments/20120203162525 >> >> > Using queue mode : byHost >> >> > Fetcher: threads: 10 >> >> > Fetcher: time-out divisor: 2 >> >> > QueueFeeder finished: total 1 records + hit by time limit :0 >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > fetching http://www.gioconews.it/ >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Using queue mode : byHost >> >> > Fetcher: throughput threshold: -1 >> >> > Fetcher: throughput threshold retries: 5 >> >> > -finishing thread FetcherThread, activeThreads=2 >> >> > -finishing thread FetcherThread, activeThreads=3 >> >> > -finishing thread FetcherThread, activeThreads=6 >> >> > -finishing thread FetcherThread, activeThreads=5 >> >> > -finishing thread FetcherThread, activeThreads=5 >> >> > -finishing thread FetcherThread, activeThreads=4 >> >> > -finishing thread FetcherThread, activeThreads=3 >> >> > -finishing thread FetcherThread, activeThreads=2 >> >> > -finishing thread FetcherThread, activeThreads=1 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 >> >> > fetch of http://www.gioconews.it/ failed with: >> >> > java.net.UnknownHostException: www.gioconews.it >> >> > -finishing thread FetcherThread, activeThreads=0 >> >> > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 >> >> > -activeThreads=0 >> >> > Fetcher: finished at 2012-02-03 16:25:47, elapsed: 00:00:18 >> >> > ParseSegment: starting at 2012-02-03 16:25:47 >> >> > ParseSegment: segment: crawl-20120203162510/segments/20120203162525 >> >> > ParseSegment: finished at 2012-02-03 16:25:51, elapsed: 00:00:04 >> >> > CrawlDb update: starting at 2012-02-03 16:25:52 >> >> > CrawlDb update: db: crawl-20120203162510/crawldb >> >> > CrawlDb update: segments: >> [crawl-20120203162510/segments/20120203162525] >> >> > CrawlDb update: additions allowed: true >> >> > CrawlDb update: URL normalizing: true >> >> > CrawlDb update: URL filtering: true >> >> > CrawlDb update: 404 purging: false >> >> > CrawlDb update: Merging segment data into db. >> >> > CrawlDb update: finished at 2012-02-03 16:25:57, elapsed: 00:00:05 >> >> > Generator: starting at 2012-02-03 16:25:58 >> >> > Generator: Selecting best-scoring urls due for fetch. >> >> > Generator: filtering: true >> >> > Generator: normalizing: true >> >> > Generator: topN: 5 >> >> > Generator: jobtracker is 'local', generating exactly one partition. >> >> > Generator: 0 records selected for fetching, exiting ... >> >> > Stopping at depth=1 - no more URLs to fetch. >> >> > LinkDb: starting at 2012-02-03 16:26:01 >> >> > LinkDb: linkdb: crawl-20120203162510/linkdb >> >> > LinkDb: URL normalize: true >> >> > LinkDb: URL filter: true >> >> > LinkDb: adding segment: >> >> > >> >> > >> >> >> file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120203162510/segments/20120203162525 >> >> > LinkDb: finished at 2012-02-03 16:26:05, elapsed: 00:00:04 >> >> > SolrIndexer: starting at 2012-02-03 16:26:06 >> >> > java.lang.RuntimeException: Invalid version (expected 2, but 1) or the >> >> data >> >> > in not in 'javabin' format >> >> > SolrDeleteDuplicates: starting at 2012-02-03 16:26:13 >> >> > SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr/ >> >> > Exception in thread "main" java.io.IOException: >> >> > org.apache.solr.client.solrj.SolrServerException: Error executing >> query >> >> > at >> >> > >> >> > >> >> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200) >> >> > at >> >> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) >> >> > at >> >> > >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) >> >> > at >> >> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) >> >> > at >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) >> >> > at >> >> > >> >> > >> >> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) >> >> > at >> >> > >> >> > >> >> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353) >> >> > at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) >> >> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> >> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) >> >> > Caused by: org.apache.solr.client.solrj.SolrServerException: Error >> >> > executing query >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) >> >> > at >> >> > org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) >> >> > at >> >> > >> >> > >> >> >> org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198) >> >> > ... 9 more >> >> > Caused by: java.lang.RuntimeException: Invalid version (expected 2, >> but >> >> 1) >> >> > or the data in not in 'javabin' format >> >> > at >> >> > >> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472) >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) >> >> > at >> >> > >> >> > >> >> >> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) >> >> > ... 11 more >> >> > >> >> >> >> ------------------------------ >> >> This e-mail and any files transmitted with it may be proprietary. >> Please >> >> note that any views or opinions presented in this e-mail are solely >> those >> >> of the author and do not necessarily represent those of Apogee >> Integration. >> >> >>