Re: Faster Solr Indexing

2012-03-19 Thread Peyman Faratin
Hi Erick, Dimitry and Mikhail thank you all for your time. I tried all of the suggestions below and am happy to report that indexing speeds have improved. There were several confounding problems including - a bank of (~20) regexes that were poorly optimized and compiled at each indexing step -

Re: Faster Solr Indexing

2012-03-12 Thread Erick Erickson
How have you determined that it's the solr add? By timing the call on the SolrJ side or by looking at the machine where Solr is running? This is the very first thing you have to answer. You can get a rough ides with any simple profiler (say Activity Monitor no a Mac, Task Manager on a Windows box).

Re: Faster Solr Indexing

2012-03-11 Thread Mikhail Khludnev
Dmitry, If you start to speak about logging, don't forget to say that jdk logging is absolutely not really performant, but is default for 3.x. Logback is much faster. Peyman, 1. shingles has performance implication. That is. it can cost much. Why term positions and phrase queries are not enough f

Re: Faster Solr Indexing

2012-03-11 Thread Dmitry Kan
one approach we have taken was decreasing the solr logging level for the posting session, described here (implemented for 1.4, but should be easy to port to 3.x): http://dmitrykan.blogspot.com/2011/01/solr-speed-up-batch-posting.html On 3/11/12, Yandong Yao wrote: > I have similar issues by usin

Re: Faster Solr Indexing

2012-03-11 Thread Yandong Yao
I have similar issues by using DIH, and org.apache.solr.update.DirectUpdateHandler2.addDoc(AddUpdateCommand) consumes most of the time when indexing 10K rows (each row is about 70K) - DIH nextRow takes about 10 seconds totally - If index uses whitespace tokenizer and lower case filter, th

Faster Solr Indexing

2012-03-10 Thread Peyman Faratin
Hi I am trying to index 12MM docs faster than is currently happening in Solr (using solrj). We have identified solr's add method as the bottleneck (and not commit - which is tuned ok through mergeFactor and maxRamBufferSize and jvm ram). Adding 1000 docs is taking approximately 25 seconds. We