no, all run on port 8983. .. 2012/2/5 Matthew Parker <mpar...@apogeeintegration.com>
> Doesn't tomcat run on port 8080, and not port 8983? Or did you change the > tomcat's default port to 8983? > On Feb 5, 2012 5:17 AM, "alessio crisantemi" <alessio.crisant...@gmail.com > > > wrote: > > > Hi All, > > I have some problems with integration of Nutch in Solr and Tomcat. > > > > I follo Nutch tutorial for integration and now, I can crawl a website: > all > > works right. > > But It I try the solr integration, I can't indexing on Solr. > > > > follow the nutch output after the command: > > bin/nutch crawl urls -solr http://127.0.0.1:8983/solr/ -depth 3 -topN 5 > > > > I read "java.lang.RuntimeException: Invalid version (expected 2, but 1) > or > > the data in not in 'javabin' format" > > MAY BE THERE IS A PROBLEM BETWEEN NUTCH 1.4 VERSION AND SOLR 1.4.1? MAY > BE > > IT REQUIRE A 3.X SOLR VERSION? > > > > thanks, > > a. > > > > crawl started in: crawl-20120203151719 > > rootUrlDir = urls > > threads = 10 > > depth = 3 > > solrUrl=http://127.0.0.1:8983/solr/ > > topN = 5 > > Injector: starting at 2012-02-03 15:17:20 > > Injector: crawlDb: crawl-20120203151719/crawldb > > Injector: urlDir: urls > > Injector: Converting injected urls to crawl db entries. > > Injector: Merging injected urls into crawl db. > > Injector: finished at 2012-02-03 15:17:31, elapsed: 00:00:10 > > Generator: starting at 2012-02-03 15:17:31 > > Generator: Selecting best-scoring urls due for fetch. > > Generator: filtering: true > > Generator: normalizing: true > > Generator: topN: 5 > > Generator: jobtracker is 'local', generating exactly one partition. > > Generator: Partitioning selected urls for politeness. > > Generator: segment: crawl-20120203151719/segments/20120203151735 > > Generator: finished at 2012-02-03 15:17:39, elapsed: 00:00:07 > > Fetcher: Your 'http.agent.name' value should be listed first in > > 'http.robots.agents' property. > > Fetcher: starting at 2012-02-03 15:17:39 > > Fetcher: segment: crawl-20120203151719/segments/20120203151735 > > Using queue mode : byHost > > Fetcher: threads: 10 > > Fetcher: time-out divisor: 2 > > QueueFeeder finished: total 1 records + hit by time limit :0 > > Using queue mode : byHost > > Using queue mode : byHost > > Using queue mode : byHost > > fetching http://www.gioconews.it/ > > Using queue mode : byHost > > -finishing thread FetcherThread, activeThreads=3 > > -finishing thread FetcherThread, activeThreads=2 > > -finishing thread FetcherThread, activeThreads=1 > > Using queue mode : byHost > > Using queue mode : byHost > > -finishing thread FetcherThread, activeThreads=1 > > -finishing thread FetcherThread, activeThreads=1 > > Using queue mode : byHost > > Using queue mode : byHost > > Using queue mode : byHost > > Using queue mode : byHost > > -finishing thread FetcherThread, activeThreads=1 > > Fetcher: throughput threshold: -1 > > -finishing thread FetcherThread, activeThreads=1 > > Fetcher: throughput threshold retries: 5 > > -finishing thread FetcherThread, activeThreads=1 > > -finishing thread FetcherThread, activeThreads=1 > > fetch of http://www.gioconews.it/ failed with: > > java.net.UnknownHostException: www.gioconews.it > > -finishing thread FetcherThread, activeThreads=0 > > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=0 > > Fetcher: finished at 2012-02-03 15:17:44, elapsed: 00:00:05 > > ParseSegment: starting at 2012-02-03 15:17:44 > > ParseSegment: segment: crawl-20120203151719/segments/20120203151735 > > ParseSegment: finished at 2012-02-03 15:17:48, elapsed: 00:00:04 > > CrawlDb update: starting at 2012-02-03 15:17:48 > > CrawlDb update: db: crawl-20120203151719/crawldb > > CrawlDb update: segments: [crawl-20120203151719/segments/20120203151735] > > CrawlDb update: additions allowed: true > > CrawlDb update: URL normalizing: true > > CrawlDb update: URL filtering: true > > CrawlDb update: 404 purging: false > > CrawlDb update: Merging segment data into db. > > CrawlDb update: finished at 2012-02-03 15:17:53, elapsed: 00:00:05 > > Generator: starting at 2012-02-03 15:17:53 > > Generator: Selecting best-scoring urls due for fetch. > > Generator: filtering: true > > Generator: normalizing: true > > Generator: topN: 5 > > Generator: jobtracker is 'local', generating exactly one partition. > > Generator: 0 records selected for fetching, exiting ... > > Stopping at depth=1 - no more URLs to fetch. > > LinkDb: starting at 2012-02-03 15:17:57 > > LinkDb: linkdb: crawl-20120203151719/linkdb > > LinkDb: URL normalize: true > > LinkDb: URL filter: true > > LinkDb: adding segment: > > > > > file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120203151719/segments/20120203151735 > > LinkDb: finished at 2012-02-03 15:18:01, elapsed: 00:00:04 > > SolrIndexer: starting at 2012-02-03 15:18:01 > > java.lang.RuntimeException: Invalid version (expected 2, but 1) or the > data > > in not in 'javabin' format > > SolrDeleteDuplicates: starting at 2012-02-03 15:18:09 > > SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr/ > > Exception in thread "main" java.io.IOException: > > org.apache.solr.client.solrj.SolrServerException: Error executing query > > at > > > > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200) > > at > > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > > at > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > > at > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > > at > > > > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) > > at > > > > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353) > > at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > > Caused by: org.apache.solr.client.solrj.SolrServerException: Error > > executing query > > at > > > > > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) > > at > > org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) > > at > > > > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198) > > ... 9 more > > Caused by: java.lang.RuntimeException: Invalid version (expected 2, but > 1) > > or the data in not in 'javabin' format > > at > > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) > > at > > > > > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) > > at > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472) > > at > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > > at > > > > > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) > > ... 11 more > > Alessio@PC-Alessio /cygdrive/c/temp/apache-nutch-1.4-bin/runtime/local > > $ bin/nutch crawl urls -solr http://127.0.0.1:8983/solr/ -depth 3 -topN > 5 > > crawl started in: crawl-20120203162510 > > rootUrlDir = urls > > threads = 10 > > depth = 3 > > solrUrl=http://127.0.0.1:8983/solr/ > > topN = 5 > > Injector: starting at 2012-02-03 16:25:11 > > Injector: crawlDb: crawl-20120203162510/crawldb > > Injector: urlDir: urls > > Injector: Converting injected urls to crawl db entries. > > Injector: Merging injected urls into crawl db. > > Injector: finished at 2012-02-03 16:25:20, elapsed: 00:00:09 > > Generator: starting at 2012-02-03 16:25:20 > > Generator: Selecting best-scoring urls due for fetch. > > Generator: filtering: true > > Generator: normalizing: true > > Generator: topN: 5 > > Generator: jobtracker is 'local', generating exactly one partition. > > Generator: Partitioning selected urls for politeness. > > Generator: segment: crawl-20120203162510/segments/20120203162525 > > Generator: finished at 2012-02-03 16:25:28, elapsed: 00:00:08 > > Fetcher: Your 'http.agent.name' value should be listed first in > > 'http.robots.agents' property. > > Fetcher: starting at 2012-02-03 16:25:28 > > Fetcher: segment: crawl-20120203162510/segments/20120203162525 > > Using queue mode : byHost > > Fetcher: threads: 10 > > Fetcher: time-out divisor: 2 > > QueueFeeder finished: total 1 records + hit by time limit :0 > > Using queue mode : byHost > > Using queue mode : byHost > > fetching http://www.gioconews.it/ > > Using queue mode : byHost > > Using queue mode : byHost > > Using queue mode : byHost > > Using queue mode : byHost > > Using queue mode : byHost > > Using queue mode : byHost > > Using queue mode : byHost > > Using queue mode : byHost > > Fetcher: throughput threshold: -1 > > Fetcher: throughput threshold retries: 5 > > -finishing thread FetcherThread, activeThreads=2 > > -finishing thread FetcherThread, activeThreads=3 > > -finishing thread FetcherThread, activeThreads=6 > > -finishing thread FetcherThread, activeThreads=5 > > -finishing thread FetcherThread, activeThreads=5 > > -finishing thread FetcherThread, activeThreads=4 > > -finishing thread FetcherThread, activeThreads=3 > > -finishing thread FetcherThread, activeThreads=2 > > -finishing thread FetcherThread, activeThreads=1 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0 > > fetch of http://www.gioconews.it/ failed with: > > java.net.UnknownHostException: www.gioconews.it > > -finishing thread FetcherThread, activeThreads=0 > > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > > -activeThreads=0 > > Fetcher: finished at 2012-02-03 16:25:47, elapsed: 00:00:18 > > ParseSegment: starting at 2012-02-03 16:25:47 > > ParseSegment: segment: crawl-20120203162510/segments/20120203162525 > > ParseSegment: finished at 2012-02-03 16:25:51, elapsed: 00:00:04 > > CrawlDb update: starting at 2012-02-03 16:25:52 > > CrawlDb update: db: crawl-20120203162510/crawldb > > CrawlDb update: segments: [crawl-20120203162510/segments/20120203162525] > > CrawlDb update: additions allowed: true > > CrawlDb update: URL normalizing: true > > CrawlDb update: URL filtering: true > > CrawlDb update: 404 purging: false > > CrawlDb update: Merging segment data into db. > > CrawlDb update: finished at 2012-02-03 16:25:57, elapsed: 00:00:05 > > Generator: starting at 2012-02-03 16:25:58 > > Generator: Selecting best-scoring urls due for fetch. > > Generator: filtering: true > > Generator: normalizing: true > > Generator: topN: 5 > > Generator: jobtracker is 'local', generating exactly one partition. > > Generator: 0 records selected for fetching, exiting ... > > Stopping at depth=1 - no more URLs to fetch. > > LinkDb: starting at 2012-02-03 16:26:01 > > LinkDb: linkdb: crawl-20120203162510/linkdb > > LinkDb: URL normalize: true > > LinkDb: URL filter: true > > LinkDb: adding segment: > > > > > file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl-20120203162510/segments/20120203162525 > > LinkDb: finished at 2012-02-03 16:26:05, elapsed: 00:00:04 > > SolrIndexer: starting at 2012-02-03 16:26:06 > > java.lang.RuntimeException: Invalid version (expected 2, but 1) or the > data > > in not in 'javabin' format > > SolrDeleteDuplicates: starting at 2012-02-03 16:26:13 > > SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr/ > > Exception in thread "main" java.io.IOException: > > org.apache.solr.client.solrj.SolrServerException: Error executing query > > at > > > > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:200) > > at > > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > > at > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) > > at > org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > > at > > > > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) > > at > > > > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353) > > at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > > Caused by: org.apache.solr.client.solrj.SolrServerException: Error > > executing query > > at > > > > > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) > > at > > org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) > > at > > > > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSplits(SolrDeleteDuplicates.java:198) > > ... 9 more > > Caused by: java.lang.RuntimeException: Invalid version (expected 2, but > 1) > > or the data in not in 'javabin' format > > at > > org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99) > > at > > > > > org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) > > at > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472) > > at > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > > at > > > > > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) > > ... 11 more > > > > ------------------------------ > This e-mail and any files transmitted with it may be proprietary. Please > note that any views or opinions presented in this e-mail are solely those > of the author and do not necessarily represent those of Apogee Integration. >