Nutch + Solr - Indexer causes java.lang.OutOfMemoryError: Java heap space

2014-09-07 Thread glumet
Hello everyone, I have configured my 2 servers to run in distributed mode (with Hadoop) and my configuration for crawling process is Nutch 2.2.1 - HBase (as a storage) and Solr. Solr is run by Tomcat. The problem is everytime I try to do the last step - I mean when I want to index data from HBase

Apache Solr 4 - after 1st commit the index does not grow

2013-07-14 Thread glumet
I have written my own plugin for Apache Nutch 2.2.1 to crawl images, videos and podcasts from selected sites (I have 180 urls in my seed). I put this metadata to a hBase store and now I want to save it to the index (Solr). I have a lot of metadatas to save (webpages + images + videos + podcast). I

Re: Apache Solr 4 - after 1st commit the index does not grow

2013-07-14 Thread glumet
When I look into the log, there is: SEVERE: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2668) at org.apache.lucene.index.IndexWriter.commitInte

Re: Apache Solr 4 - after 1st commit the index does not grow

2013-07-15 Thread glumet
Ok, I have removed the problem with OutOfMemory by increasing jvm parameters... and now I have another problem. My index worked since yesterday evening... the number of documents increased (I run bin/crawl script every 3 hours and I have 27040 documents now).. but the last increase was 6 hours ago.

Re: Apache Solr 4 - after 1st commit the index does not grow

2013-07-15 Thread glumet
As I can see, this is the same problem like one from older posts - http://lucene.472066.n3.nabble.com/strange-utf-8-problem-td3094473.html ...but it was without any response. -- View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-after-1st-commit-the-index-does-not-gr